Chris Cosby wrote:
>
>
> On Tue, Jul 22, 2008 at 11:19 AM, <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     [EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]> wrote on 07/22/2008
>     09:58:53 AM:
>
>     > To do dedup properly, it seems like there would have to be some
>     > overly complicated methodology for a sort of delayed dedup of the
>     > data. For speed, you'd want your writes to go straight into the
>     > cache and get flushed out as quickly as possibly, keep everything as
>     > ACID as possible. Then, a dedup scrubber would take what was
>     > written, do the voodoo magic of checksumming the new data, scanning
>     > the tree to see if there are any matches, locking the duplicates,
>     > run the usage counters up or down for that block of data, swapping
>     > out inodes, and marking the duplicate data as free space.
>     I agree,  but what you are describing is file based dedup,  ZFS
>     already has
>     the groundwork for dedup in the system (block level checksuming and
>     pointers).
>
>     > It's a
>     > lofty goal, but one that is doable. I guess this is only necessary
>     > if deduplication is done at the file level. If done at the block
>     > level, it could possibly be done on the fly, what with the already
>     > implemented checksumming at the block level,
>
>     exactly -- that is why it is attractive for ZFS,  so much of the
>     groundwork
>     is done and needed for the fs/pool already.
>
>     > but then your reads
>     > will suffer because pieces of files can potentially be spread all
>     > over hell and half of Georgia on the zdevs.
>
>     I don't know that you can make this statement without some study of an
>     actual implementation on real world data -- and then because it is
>     block
>     based,  you should see varying degrees of this dedup-flack-frag
>     depending
>     on data/usage.
>
> It's just a NonScientificWAG. I agree that most of the duplicated 
> blocks will in most cases be part of identical files anyway, and thus 
> lined up exactly as you'd want them. I was just free thinking and typing.
>  
No, you are right to be concerned over block-level dedup seriously 
impacting seeks.  The problem is that, given many common storage 
scenarios, you will have not just similar files, but multiple common 
sections of many files.  Things such as the various standard 
productivity app documents will not just have the same header sections, 
but internally, there will be significant duplications of considerable 
length with other documents from the same application.  Your 5MB Word 
file is thus likely to share several (actually, many) multi-kB segments 
with other Word files.  You will thus end up seeking all over the disk 
to read _most_ Word files.  Which really sucks.  I can list at least a 
couple more common scenarios where dedup has to potential to save at 
least some reasonable amount of space, yet will absolutely kill performance.


>     For instance,  I would imagine that in many scenarios much od the
>     dedup
>     data blocks would belong to the same or very similar files. In
>     this case
>     the blocks were written as best they could on the first write,
>      the deduped
>     blocks would point to a pretty sequential line o blocks.  Now on
>     some files
>     there may be duplicate header or similar portions of data -- these may
>     cause you to jump around the disk; but I do not know how much this
>     would be
>     hit or impact real world usage.
>
>
>     > Deduplication is going
>     > to require the judicious application of hallucinogens and man hours.
>     > I expect that someone is up to the task.
>
>     I would prefer the coder(s) not be seeing "pink elephants" while
>     writing
>     this,  but yes it can and will be done.  It (I believe) will be easier
>     after the grow/shrink/evac code paths are in place though. Also,  the
>     grow/shrink/evac path allows (if it is done right) for other cool
>     things
>     like a base to build a roaming defrag that takes into account snaps,
>     clones, live and the like.  I know that some feel that the
>     grow/shrink/evac
>     code is more important for home users,  but I think that it is super
>     important for most of these additional features.
>
> The elephants are just there to keep the coders company. There are 
> tons of benefits for dedup, both for home and non-home users. I'm 
> happy that it's going to be done. I expect the first complaints will 
> come from those people who don't understand it, and their df and du 
> numbers look different than their zpool status ones. Perhaps df/du 
> will just have to be faked out for those folks, or we just apply the 
> same hallucinogens to them instead.
>
I'm still not convinced that dedup is really worth it for anything but 
very limited, constrained usage. Disk is just so cheap, that you 
_really_ have to have an enormous amount of dup before the performance 
penalties of dedup are countered.

This in many ways reminds me the last year's discussion over file 
versioning in the filesystem.  It sounds like a cool idea, but it's not 
a generally-good idea.  I tend to think that this kind of problem is 
better served by applications handling it, if they are concerned about it.

Pretty much, here's what I've heard:

Dedup Advantages:

(1)  save space relative to the amount of duplication.  this is highly 
dependent on workload, and ranges from 0% to 99%, but the distribution 
of possibilities isn't a bell curve (i.e. the average space saved isn't 
50%).


Dedup Disadvantages:

(1)  increase codebase complexity, in both cases of dedup during write, 
and ex-post-facto batched dedup

(2)  noticable write performance penalty (assuming block-level dedup on 
write), with potential write cache issues.

(3)  very significant post-write dedup time, at least on the order of 
'zfs scrub'. Also, during such a post-write scenario, it more or less 
takes the zpool out of usage.

(4) If dedup is done at block level, not at file level, it kills read 
performance, effectively turning all dedup'd files from sequential read 
to a random read.  That is, block-level dedup drastically accelerates 
filesystem fragmentation.

(5)  Something no one has talked about, but is of concern. By removing 
duplication, you increase the likelihood that loss of the "master" 
segment will corrupt many more files. Yes, ZFS has self-healing and 
such.  But, particularly in the case where there is no ZFS pool 
redundancy (or pool-level redundancy has been compromised), loss of one 
block can thus be many more times severe.


We need to think long and hard about what the real widespread benefits 
are of dedup before committing to a filesystem-level solution, rather 
than an application-level one.  In particular, we need some real-world 
data on the actual level of duplication under a wide variety of 
circumstances.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to