Adam Leventhal wrote:
> B.1 zfs(1M)
> 
> The interface for enabling and disabling deduplication is simple and  
> straightforward, and follows the convention of other similar ZFS  
> settings. We simply add a new per-dataset property, dedup:
> 
>       zfs set dedup=<on | off | checksum>[,verify]
>       zfs get dedup

I'm happy with this.

> The acceptable values for the dedup property are as follows:
> 
>       off (the default)
>       on (see below)
>       on,verify
>       verify
>       sha256
>       sha256,verify
>       fletcher4,verify
>       fletcher2,verify

Given that dedup allows specifying a checksum does this mean that there 
need not be a relationship between the checksum used for the block on 
disk (ie the one stored in blkptr_t) and the one used for dedup ?

Is this valid:

zfs set checksum=fletcher4 tank
zfs set dedup=sha256

If so what is stored on disk in the blkptr_t ?  I assume it is a 
fletecher4 stored there.  Where is the sha256 checksum stored then ? In 
the DDT ?

Does this mean that deduplication is not using the blkptr checksum at 
all even if the blkptr checksum and dedup checksum are the same ?

When in the ZIO pipeline is the checksum specified with the dedup 
property calculated ?  I'm assuming it is in zio_write_bp_init() after 
compression and after encryption so it is on the state of the block 
exactly as it will be written to disk.

Can gang blocks be deduplicated ?

> The dedup property can be set to any of the cryptographically strong  
> checksums supported by ZFS (today just sha256). In this mode we rely  
> on the checksum alone to ensure no data collisions. Alternatively the  
> dedup property can be set to '<checksum>,verify' in which the given  
> checksum is used for comparison, the blocks are compared to ensure  
> against collisions. This is strictly relevant only for non- 
> cryptographically secure checksums but we offer it as an option for  
> customers who seek that reassurance. The value of 'on' uses the zpool- 
> wide default defined by the zpool property dedupchecksum (see B.2.1).

Glad that you do offer verify as a choice.   It would be very useful to 
provide some sort of log output for the cases where verify found a 
collision - ie the checksum hashes matched but the verify said they were 
different.  Not useful to end users so it could be a DTrace SDT or only 
in a DEBUG kernel.  If this ever shows up a "hit" when 
dedup=sha256,verify it will make ZFS famous for finding collisions in 
SHA256.

> As an explicit request for input from the ARC, our fletcher2 implementation
> has been shown to be suboptimal and results in a large number of
> collisions (as a result, the default checksum has been changed to
> fletcher4). Should 'fletcher2,verify' be permitted as an option for
> consistency or should we eliminate that option since it would rarely
> be an attractive choice for users due to the high number of hash
> collisions.

I don't think fletcher2,verify should be provided.

> B.2 zpool(1M)
> 
> B.2.1 Mutable properties
> 
> Two new mutable pool-wide properties will be added:
> 
>       zpool set dedupchecksum=<cryptographically strong checksum>

Why is this needed when we don't have a pool level property for
the default checksum or compression or encryption (or any other property 
that is inherited and has an "on | off | ..." style of setting ?

Under what circumstances would this be needed rather than setting dedup 
to the required value ?

Is this a new precendent that all properties with an "on" value should 
have a pool level property to determine what "on" is ?

>       zpool set dedupditto=<number>

> The second allows the administrator to select a threshhold afterwhich
> 2 copies of a block are stored rather than 1. For example, if many
> duplicate blocks exist deduplication would reduce that count to just 1;
> at some threshhold, it becomes desirable to have multiple copies to
> guard against the multiplied effects of the loss of a single block.
> The default value is '100'.

I think I understand that this needs to be pool wide because this is a
SPA level concept not a dataset level one.

Is it actually necessary to expose this tunable ? Given there is already 
a per dataset copies property how does this interact with that ?

-- 
Darren J Moffat

Reply via email to