subject:"status of inline deduplication in btrfs"

Re: status of inline deduplication in btrfs

2017-08-28 Thread Duncan

shally verma posted on Mon, 28 Aug 2017 12:49:10 +0530 as excerpted:

> On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowski 
> wrote:
>> On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
>>> The second has to do with btrfs scaling issues due to reflinking,
>>> which of course is the operational mechanism for both snapshotting and
>>> dedup.
>>> Snapshotting of course reflinks the entire subvolume, so it's
>>> reflinking on a /massive/ scale.  While normal file operations aren't
>>> affected much,
>>> btrfs maintenance operations such as balance and check scale badly
>>> enough with snapshotting (due to the reflinking) that keeping the
>>> number of snapshots per subvolume under 250 or so is strongly
>>> recommended, and keeping them to double-digits or even single-digits
>>> is recommended if possible.
>>>
>>> Dedup works by reflinking as well, but its effect on btrfs maintenance
>>> will be far more variable, depending of course on how effective the
>>> deduping, and thus the reflinking, is.  But considering that
>>> snapshotting is effectively 100% effective deduping of the entire
>>> subvolume (until the snapshot and active copy begin to diverge, at
>>> least), that tends to be the worst case, so figuring a full two-copy
>>> dedup as equivalent to one snapshot is a reasonable estimate of
>>> effect.
>>>  If dedup only catches 10%, only once, than it would be 10% of a
>>> snapshot's effect.  If it's 10% but there's 10 duplicated instances,
>>> that's the effect of a single snapshot. Assuming of course that the
>>> dedup domain is the same as the subvolume that's being snapshotted.
> 
> This looks to me a debate between using inline dedup Vs snapshotting or
> more precisely, doing a dedupe via snapshots?
> Did I understand it correct? if yes, does it mean people are still in
> thoughts if current design and proposal to inline dedup is right way to
> go for?

Not that I'm aware of and it wasn't my intent to leave that impression.

What I'm saying is that btrfs uses the same underlying mechanism, 
reflinking, for both snapshotting and dedup.

A rather limited but perhaps useful analogy from an /entirely/ different 
area might be that both single-person bicycles and full-size truck/
trailer rigs use the same underlying mechanism, wheels with tires turning 
against the ground, to move, while they have vastly different uses and 
neither one can replace the other.

And just as the common to both cases tire has the limitation that it can 
be punctured and go flat, that applies to both due to the common 
mechanism used to move, so reflinking has certain limitations that apply 
to both snapshotting and dedup, due to the common mechanism used in the 
implementation.

Of course taking the analogy much further than that will likely result in 
comically absurd conclusions, but hopefully when kept within its limits 
it's useful to convey my point, two technologies with very different 
usage at the surface level, taking advantage of a common implementation 
mechanism underneath.

And because the underlying mechanism is the same, its limits become the 
limits of both overlying solutions, however they otherwise differ.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-28 Thread Austin S. Hemmelgarn


On 2017-08-28 06:32, Adam Borowski wrote:

On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote:

Am bit confused over here, is your description based on offline-dedupe
here Or its with inline deduplication?


It doesn't matter _how_ you get to excessive reflinking, the resulting
slowdown is the same.

By the way, you can try "bees", it does nearline-dedupe which is for
practical purposes as good as fully online, and unlike the latter, has no
way to damage your data in case of bugs (mistaken userland dedupe can at
most make the kernel pointlessly read and compare data).

I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME
asynchronously right after a write gets put into the page cache, which in
most cases is quick enough to avoid writeout.
I would also recommend looking at 'bees'.  If you absolutely _must_ have 
online or near-online deduplication, then this is your best option 
currently from a data safety perspective.


That said, it's worth pointing out that in-line deduplication is not 
always the best answer.  In fact, it's quite often a sub-optimal answer 
compared to a combination of compression, sparse files, and batch 
deduplication.  Compression and usage of sparse files will get you about 
the same space savings most of the time as in-line deduplication (I've 
tested this on ZFS on FreeBSD using native in-line deduplication, and 
with BTRFS on Linux using bees) while using much less memory, and about 
the same amount of processor time.  In the event that you need better 
space savings than that, you're better off using batch deduplication 
because it gives you better control over when you're using more system 
resources and will often get better overall results than in-line 
deduplication.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-28 Thread Adam Borowski

On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote:
> Am bit confused over here, is your description based on offline-dedupe
> here Or its with inline deduplication?

It doesn't matter _how_ you get to excessive reflinking, the resulting
slowdown is the same.

By the way, you can try "bees", it does nearline-dedupe which is for
practical purposes as good as fully online, and unlike the latter, has no
way to damage your data in case of bugs (mistaken userland dedupe can at
most make the kernel pointlessly read and compare data).

I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME
asynchronously right after a write gets put into the page cache, which in
most cases is quick enough to avoid writeout.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-28 Thread shally verma

On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowski  wrote:
> On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
>> The second has to do with btrfs scaling issues due to reflinking, which
>> of course is the operational mechanism for both snapshotting and dedup.
>> Snapshotting of course reflinks the entire subvolume, so it's reflinking
>> on a /massive/ scale.  While normal file operations aren't affected much,
>> btrfs maintenance operations such as balance and check scale badly enough
>> with snapshotting (due to the reflinking) that keeping the number of
>> snapshots per subvolume under 250 or so is strongly recommended, and
>> keeping them to double-digits or even single-digits is recommended if
>> possible.
>>
>> Dedup works by reflinking as well, but its effect on btrfs maintenance
>> will be far more variable, depending of course on how effective the
>> deduping, and thus the reflinking, is.  But considering that snapshotting
>> is effectively 100% effective deduping of the entire subvolume (until the
>> snapshot and active copy begin to diverge, at least), that tends to be
>> the worst case, so figuring a full two-copy dedup as equivalent to one
>> snapshot is a reasonable estimate of effect.  If dedup only catches 10%,
>> only once, than it would be 10% of a snapshot's effect.  If it's 10% but
>> there's 10 duplicated instances, that's the effect of a single snapshot.
>> Assuming of course that the dedup domain is the same as the subvolume
>> that's being snapshotted.

This looks to me a debate between using inline dedup Vs snapshotting
or more precisely, doing a dedupe via snapshots?
Did I understand it correct? if yes, does it mean people are still in
thoughts if current design and proposal to inline dedup
is right way to go for?

>
> Nope, snapshotting is not anywhere near the worst case of dedup:
>
> [/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +|
> cut -d' ' -f1|sort|uniq -c|sort -nr|head
>
> Even on the system parts (ie, ignoring my data) of my desktop, top files
> have the following dup counts: 532 384 373 164 123 122 101.  On this small
> SSD, the system parts are reflinked by snapshots with 10 dailies, and by
> deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc
> containers (chroots are mostly a zoo of different architectures).
>
> This is nothing compared to the backup server, which stores backups of 46
> machines (only system/user and small data, bulky stuff is backed up
> elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and
> yearly).  This worked well enough until I made the mistake of deduping the
> whole thing.
>
> But, this is still not the worst horror imaginable.  I'd recommend using
> whole-file dedup only as this avoids this pitfall: take two VM images, run
> block dedup on them.  Identical blocks in them will be cross-reflinked.  And
> there's _many_.  The vast majority of duplicate blocks are all-zero: I just
> ran fallocate -d on a 40G win10 VM and it shrank to 19G.  AFAIK
> file_extent_same is not yet smart enough to dedupe them to a hole instead.
>

Am bit confused over here, is your description based on offline-dedupe
here Or its with inline deduplication?

Thanks
Shally

>
> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
> ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
> ⠈⠳⣄
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-26 Thread Adam Borowski

On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
> The second has to do with btrfs scaling issues due to reflinking, which 
> of course is the operational mechanism for both snapshotting and dedup.  
> Snapshotting of course reflinks the entire subvolume, so it's reflinking 
> on a /massive/ scale.  While normal file operations aren't affected much, 
> btrfs maintenance operations such as balance and check scale badly enough 
> with snapshotting (due to the reflinking) that keeping the number of 
> snapshots per subvolume under 250 or so is strongly recommended, and 
> keeping them to double-digits or even single-digits is recommended if 
> possible.
> 
> Dedup works by reflinking as well, but its effect on btrfs maintenance 
> will be far more variable, depending of course on how effective the 
> deduping, and thus the reflinking, is.  But considering that snapshotting 
> is effectively 100% effective deduping of the entire subvolume (until the 
> snapshot and active copy begin to diverge, at least), that tends to be 
> the worst case, so figuring a full two-copy dedup as equivalent to one 
> snapshot is a reasonable estimate of effect.  If dedup only catches 10%, 
> only once, than it would be 10% of a snapshot's effect.  If it's 10% but 
> there's 10 duplicated instances, that's the effect of a single snapshot.  
> Assuming of course that the dedup domain is the same as the subvolume 
> that's being snapshotted.

Nope, snapshotting is not anywhere near the worst case of dedup:

[/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +|
cut -d' ' -f1|sort|uniq -c|sort -nr|head

Even on the system parts (ie, ignoring my data) of my desktop, top files
have the following dup counts: 532 384 373 164 123 122 101.  On this small
SSD, the system parts are reflinked by snapshots with 10 dailies, and by
deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc
containers (chroots are mostly a zoo of different architectures).

This is nothing compared to the backup server, which stores backups of 46
machines (only system/user and small data, bulky stuff is backed up
elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and
yearly).  This worked well enough until I made the mistake of deduping the
whole thing.

But, this is still not the worst horror imaginable.  I'd recommend using
whole-file dedup only as this avoids this pitfall: take two VM images, run
block dedup on them.  Identical blocks in them will be cross-reflinked.  And
there's _many_.  The vast majority of duplicate blocks are all-zero: I just
ran fallocate -d on a 40G win10 VM and it shrank to 19G.  AFAIK
file_extent_same is not yet smart enough to dedupe them to a hole instead.

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-25 Thread Duncan

shally verma posted on Fri, 25 Aug 2017 23:01:10 +0530 as excerpted:

> On Thu, Aug 24, 2017 at 6:39 AM, Tsutomu Itoh 
> wrote:
>> On 2017/08/23 23:52, shally verma wrote:
>>> HI
>>>
>>> Through btrfs wiki, I got to know about inline patch and this git
>>> location https://github.com/adam900710/linux but I am not sure what's
>>> progress and status on this. Could any one please confirm what is the
>>> status of inline  deduplication into btrfs and if it is the correct
>>> location to see its support?
>>
>> Lu Fengqi has posted the latest patchset (v14.4).
>> https://marc.info/?l=linux-btrfs&m=149984943031184&w=2
>>
>> Unfortunately, it has not been committed yet.
>>
> Thanks for your response, I will go through patches. Could you also help
> with answer to this question " what's progress and status on this".  Do
> we have any test run reports that tell about its stability levels,
> performance metric and other known issues?
> and possibly a roadmap of commit?

I'm not a dev, just a btrfs user and list regular myself, and don't 
remember seeing a mainline-merge roadmap, tho dedup's not part of my own 
use-case so I could have missed it.

But I can answer some of the other questions based on what I've seen on-
list...

First, while I don't have a merge-roadmap, I do know there's some major 
dev-sponsoring corporate interest in dedup, so the feature should be on 
the fast-track to merge, and it should get pretty good testing and 
bugfixing as well.

That said, as any new feature, it's likely to take a few kernel cycles 
after merge to settle down, and my own rule-of-thumb recommendation for 
new feature stability is wait at least 3-6 kernel cycles after merge 
before considering a feature for anything but testing, and then, check 
the list for current status before relying on it.

It's worth noting that with raid56, after feature-completion in 3.19 
(IIRC), it took two kernel cycles to work out the immediate bugs, and 
only at about 5-6 cycles, basically a year later, did the alarm bells 
really start going off that there were still very serious problems with 
it, problems that only very recently (4.12 IIRC) have been fixed, and 
even now after the fix, due to btrfs implementation peculiarities, the 
infamous e parity-raid write hole negates some of the btrfs data 
checksumming and integrity features that are otherwise major advantages 
to btrfs, a problem that's going to require some tweaks to the 
implementation to fix.

So basically, wait a year after merge and ask what the status is then if 
your use-case can't afford either live-failover (to something /not/ using 
the feature) or the down-time to restore from backup.  Because a year out 
is sometimes how long it takes for normally hidden but potentially quite 
nasty bugs to show up...

As for performance...

The in-band dedup is designed to be fast, but with limited memory usage,  
rather than slow and thorough.  It won't catch all dups, only those where 
the original data extent has been recently used enough for the hashes to 
be in the in-memory-inline-dedup-cache, so it's opportunistic and should 
be very close to the same speed as non-deduped IO.  This contrasts with 
the out-of-band dedup, which is far more through, relying on a larger on-
storage cache, thus potentially making it slower but much more likely to 
catch dups.

There are two big caveats, both related to the way dedup works its magic, 
via reflinks.  The first, fragmentation due to the block-based dedup, 
should be easily anticipated by anyone familiar with block based 
filesystems and the hows and whys of fragmentation in general, but 
fragmentation in general tends to be more of an issue on COW-based 
filesystems, particularly where the write pattern includes heavy file-
internal rewrites, and dedup has the potential to exacerbate that even 
further, since it may well pick blocks from multiple files and extents if 
they happen to be duplicated blocks, used recently enough to still be in-
cache.

Of course you can manually defrag, but that breaks the reflinks and thus 
re-duplicates the data (regardless of it was deduped due to dedup or to 
snapshotting).  The autodefrag mount option should help at less cost than 
manual defrag, because it only triggers during write and will only try to 
COW somewhat larger extents than the single block that would otherwise be 
COWed if that was all that was rewritten, but it'll still affect dedup 
efficiency, just less so than a manual defrag.  So it's a trade-off.

The second has to do with btrfs scaling issues due to reflinking, which 
of course is the operational mechanism for both snapshotting and dedup.  
Snapshotting of course reflinks the entire subvolume, so it's reflinking 
on a /massive/ scale.  While normal file operations aren't affected much, 
btrfs maintenance operations such as balance and check scale badly enough 
with snapshotting (due to the reflinking) that keeping the number of 
snapshots per subvolume under 250 or so is strongly r

Re: status of inline deduplication in btrfs

2017-08-25 Thread shally verma

On Thu, Aug 24, 2017 at 6:39 AM, Tsutomu Itoh  wrote:
> On 2017/08/23 23:52, shally verma wrote:
>> HI
>>
>> Through btrfs wiki, I got to know about inline patch and this git
>> location https://github.com/adam900710/linux but I am not sure what's
>> progress and status on this. Could any one please confirm what is the
>> status of inline  deduplication into btrfs and if it is the correct
>> location to see its support?
>
> Lu Fengqi has posted the latest patchset (v14.4).
> https://marc.info/?l=linux-btrfs&m=149984943031184&w=2
>
> Unfortunately, it has not been committed yet.
>
Thanks for your response, I will go through patches. Could you also
help with answer to this question " what's
progress and status on this".  Do we have any test run reports that
tell about its stability levels, performance metric and other known
issues?
and possibly a roadmap of commit?

Thanks
Shally

> Thanks,
> Tsutomu
>
>>
>> Thanks
>> Shally
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

2017-08-23 Thread Tsutomu Itoh

On 2017/08/23 23:52, shally verma wrote:
> HI
> 
> Through btrfs wiki, I got to know about inline patch and this git
> location https://github.com/adam900710/linux but I am not sure what's
> progress and status on this. Could any one please confirm what is the
> status of inline  deduplication into btrfs and if it is the correct
> location to see its support?

Lu Fengqi has posted the latest patchset (v14.4).
https://marc.info/?l=linux-btrfs&m=149984943031184&w=2

Unfortunately, it has not been committed yet.

Thanks,
Tsutomu

> 
> Thanks
> Shally
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

status of inline deduplication in btrfs

2017-08-23 Thread shally verma

HI

Through btrfs wiki, I got to know about inline patch and this git
location https://github.com/adam900710/linux but I am not sure what's
progress and status on this. Could any one please confirm what is the
status of inline  deduplication into btrfs and if it is the correct
location to see its support?

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

Re: status of inline deduplication in btrfs

status of inline deduplication in btrfs

9 matches

Site Navigation

Mail list logo

Footer information