On Tue, Jul 31, 2012 at 9:36 AM, Ray Arachelian <r...@arachelian.com> wrote:
> On 07/31/2012 09:46 AM, opensolarisisdeadlongliveopensolaris wrote:
>> Dedup: First of all, I don't recommend using dedup under any
>> circumstance. Not that it's unstable or anything, just that the
>> performance is so horrible, it's never worth while. But particularly
>> with encrypted data, you're guaranteed to have no duplicate data
>> anyway, so it would be a pure waste. Don't do it.
>> _______________________________________________ zfs-discuss mailing
>> list email@example.com
> One thing you can do is enable dedup when you copy all your data from
> one zpool to another, then, when you're done, disable dedup. It will no
> longer waste a ton of memory, and your new volume will have a high dedup
> ratio. (Obviously anything you add after you turn dedup off won't be
> deduped.) You can keep the old pool as a backup, or wipe it or whatever
> and later on do the same operation in the other direction.
Once something is written deduped you will always use the memory when
you want to read any files that were written when dedup was enabled,
so you do not save any memory unless you do not normally access most
of your data.
Also don't let the system crash :D or try to delete too much from the
deduped dataset :D (including snapshots or the dataset itself) because
then you have to reload all (most) of the DDT in order to delete the
files. This gets a lot of people in trouble (including me at $work
:|) because you need to have the ram available at all times to load
the most (>75% to grab a number out of the air) in case the server
crashes. Otherwise you are stuck with a machine trying to verify its
filesystem for hours. I have one test system that has 4 GB of RAM and
2 TB of deduped data, when it crashes (panic, powerfailure, etc) it
would take 8-12 hours to boot up again. It now has <1TB of data and
will boot in about 5 minutes or so.
> Anyone know if zfs send | zfs get will maintain the deduped files after
> this? Maybe once deduped you can wipe the old pool, then use get|send
> to get a deduped backup?
No zfs send | zfs recv will not dedup new data unless dedup is on,
however, the existing files remain deduped until the snapshot that is
retaining them is removed.
>From practical work, dedup is only worth is if you have a >10 dedup
ratio or a small deduped dataset because in practice at $work we have
found that 1TB of deduped data require 6-8GB of RAM (using multi
hundred GB files, so 128K record sizes, with smaller record sizes the
ram numbers balloon). Which means even a low dedup ratio of 3 it is
still cheaper to just buy more disks than find a system that can hold
enough ram to dedup 4-5TB of data (64-128GB of RAM). Of course keep in
mind we at $work only have practical knowledge of systems up to 4 TB
of deduped data. We are planning some 45+TB and plan on using
compression=gzip-9 and dedup=off.
As far as the OP is concerned, unless you have a dataset that will
dedup well don't bother with it, use compression instead (don't use
both compression and dedup because you will shrink the average record
size and balloon the memory usage). As mentioned by Ned Harvey, dedup
of encrypted data is probably completely useless (it depends on the
chaining mode, duplicate data in the encrypted dataset may be
encrypted the same way allow for dedup of the encrypted data stream).
However, if the encryption is done below zfs, like what GELI would be
doing by giving the encrypted block device to zfs, then the use of
dedup reverts back the the standard, "are you going to have enough
duplicate data to get a ratio that is high enough to be worth the RAM
As I reread the OP, to make sure that the OPs question is actually
answered; use of encryption will not affect zfs integrity beyond
normal issues associated with zfs integrity and disk encryption. That
is, missing encryption keys is like a failed disk (if below zfs) or
any other file on a zfs filesystem (if above zfs). Zfs integrity
(checksumming and the transactions) are not affected by the underlying
block device (if encryption is below zfs) or the contents of a file on
the system system (if encryption is above zfs). The "raid" features
of zfs also work no differently, 'n' encrypted block devices is no
different that 'n' plain hard drives as far as ZFS is concerned.
zfs-discuss mailing list