This case looks good. I have a few thoughts though. First, wrt. fletcher2 ... I don't think it is a good idea to support the algorithm if it has these undesirable characteristics. Its not the case that you need to support it for compatibility.
Actually, that brings into question fletcher4 as well. I guess my question is, why not just support sha256 for now? I'm assuming that the motivation here is reduced computation time for fletcher vs. sha256? The second question is about the default for dedupditto. It seems to me at least that a default value of "2" might be better than "100", as it means that dedup wins with the 3rd copy, and also wins by adding yet another level of data redundancy. (In other words, dedup actually improves data safety this way.) Where did the value of "100" come from? In any case, none of the above items are enough to prevent me from issuing a +1 to the case. Technically the case should falls under our 48 hour timeline for fast tracks, but I think we could probably review it, and move the timeout to Thursday morning, at which point it would be approved by successful time with the necessary +1s and no derails. This would only cause a 24 hour delay, but allow the case to still fall within existing ARC practice. Would that be acceptable? - Garrett Adam Leventhal wrote: > I'm sponsoring the following fasttrack on behalf of Jeff Bonwick, and > the ZFS team. The binding is patch and the commitment level is Committed. > > Apologies for the late notice, but if it is possible to review this case > at the 10/21/2009 meeting that would be much appreciated. We believe the > interfaces as defined are in keeping with other ZFS interfaces. > > Please take particular note of the question at the end of B.1 where we're > unsure of the best path and hope the ARC can provide guidance. > > Thanks. > > Adam > > > Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI > This information is Copyright 2009 Sun Microsystems > 1. Introduction > 1.1. Project/Component Working Name: > ZFS Deduplication Properties > 1.2. Name of Document Author/Supplier: > Author: Jeffrey Bonwick > 1.3 Date of This Document: > 19 October, 2009 > 4. Technical Description > A. Background > > Deduplication is a feature of modern storage platforms by which > varying mechanisms are employed to reduce the amount of total data > stored by eliminating and sharing common components. We are adding > deduplication to ZFS in order to further enable market penetration > with ZFS and the Sun Storage 7000 series. > > The algorithm employed by ZFS deduplication uses checksum based > comparison of blocks with optional verification (for example with non- > cryptographically secure checksums). Deduplication is performed across > the entire ZFS storage pool; administrators can select if individual > datasets have deduplication enabled or not. This is useful in mixed- > mode environments in which some datasets have highly duplicated data > (e.g. VMware images, VDI, home directories, or email folders) and > others are unique (e.g. databases). > > With this case we propose the user interface for enabling > deduplication in ZFS. > > > B. Interface > > B.1 zfs(1M) > > The interface for enabling and disabling deduplication is simple and > straightforward, and follows the convention of other similar ZFS > settings. We simply add a new per-dataset property, dedup: > > zfs set dedup=<on | off | checksum>[,verify] > zfs get dedup > > The acceptable values for the dedup property are as follows: > > off (the default) > on (see below) > on,verify > verify > sha256 > sha256,verify > fletcher4,verify > fletcher2,verify > > The dedup property can be set to any of the cryptographically strong > checksums supported by ZFS (today just sha256). In this mode we rely > on the checksum alone to ensure no data collisions. Alternatively the > dedup property can be set to '<checksum>,verify' in which the given > checksum is used for comparison, the blocks are compared to ensure > against collisions. This is strictly relevant only for non- > cryptographically secure checksums but we offer it as an option for > customers who seek that reassurance. The value of 'on' uses the zpool- > wide default defined by the zpool property dedupchecksum (see B.2.1). > > As an explicit request for input from the ARC, our fletcher2 implementation > has been shown to be suboptimal and results in a large number of > collisions (as a result, the default checksum has been changed to > fletcher4). Should 'fletcher2,verify' be permitted as an option for > consistency or should we eliminate that option since it would rarely > be an attractive choice for users due to the high number of hash > collisions. > > > B.2 zpool(1M) > > B.2.1 Mutable properties > > Two new mutable pool-wide properties will be added: > > zpool set dedupchecksum=<cryptographically strong checksum> > zpool set dedupditto=<number> > > The first selects the pool-wide default to be used when a dataset's dedup > value is set to 'on' or 'on,verify'. The default value for dedupchecksum > is 'sha256'. > > The second allows the administrator to select a threshhold afterwhich > 2 copies of a block are stored rather than 1. For example, if many > duplicate blocks exist deduplication would reduce that count to just 1; > at some threshhold, it becomes desirable to have multiple copies to > guard against the multiplied effects of the loss of a single block. > The default value is '100'. > > > B.2.2 Statistics > > Two new read-only pool-wide properties will be added to track > deduplication efficacy: > > deduptotal # the amount of deduplicated data on disk > dedupinflated # deduplicated data had duplicates not be removed > > With these two properties and the pool's size property one could > compute: > > dedup efficacy = dedupinflated / deduptotal > dedup savings = dedupinflated - deduptotal > dedup ratio = (size + dedupinflated) / (size + deduptotal) > > Note that efficacy measures only data that was a candidate for > deduplication (i.e. on which the dedup dataset property was enabled) > whereas the ratio measures a similar value for all data regardless of > whether it was a candidate for deduplication. > > The 'zpool status' command will be modified to present the size and > dedup ratio and efficacy for the give pool or pools: > > # zpool status tank > pool: tank > state: ONLINE > size: 464G > dedup: 1.90x (total) / 5.41x (dedup enabled) > ... > > > C. Man Page Changes > > The zfs(1M) and zpool(1M) man pages will be modified to include the > descriptions above for the new properties as well as an overview of the > deduplication feature. > > 6. Resources and Schedule > 6.4. Steering Committee requested information > 6.4.1. Consolidation C-team Name: > OS/Net > 6.5. ARC review type: FastTrack > 6.6. ARC Exposure: open > >