On Tue, Jul 31, 2012 at 9:36 AM, Ray Arachelian <r...@arachelian.com> wrote: > On 07/31/2012 09:46 AM, opensolarisisdeadlongliveopensolaris wrote: >> Dedup: First of all, I don't recommend using dedup under any >> circumstance. Not that it's unstable or anything, just that the >> performance is so horrible, it's never worth while. But particularly >> with encrypted data, you're guaranteed to have no duplicate data >> anyway, so it would be a pure waste. Don't do it. >> _______________________________________________ zfs-discuss mailing >> list email@example.com >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > One thing you can do is enable dedup when you copy all your data from > one zpool to another, then, when you're done, disable dedup. It will no > longer waste a ton of memory, and your new volume will have a high dedup > ratio. (Obviously anything you add after you turn dedup off won't be > deduped.) You can keep the old pool as a backup, or wipe it or whatever > and later on do the same operation in the other direction.
Once something is written deduped you will always use the memory when you want to read any files that were written when dedup was enabled, so you do not save any memory unless you do not normally access most of your data. Also don't let the system crash :D or try to delete too much from the deduped dataset :D (including snapshots or the dataset itself) because then you have to reload all (most) of the DDT in order to delete the files. This gets a lot of people in trouble (including me at $work :|) because you need to have the ram available at all times to load the most (>75% to grab a number out of the air) in case the server crashes. Otherwise you are stuck with a machine trying to verify its filesystem for hours. I have one test system that has 4 GB of RAM and 2 TB of deduped data, when it crashes (panic, powerfailure, etc) it would take 8-12 hours to boot up again. It now has <1TB of data and will boot in about 5 minutes or so. > > Anyone know if zfs send | zfs get will maintain the deduped files after > this? Maybe once deduped you can wipe the old pool, then use get|send > to get a deduped backup? No zfs send | zfs recv will not dedup new data unless dedup is on, however, the existing files remain deduped until the snapshot that is retaining them is removed. >From practical work, dedup is only worth is if you have a >10 dedup ratio or a small deduped dataset because in practice at $work we have found that 1TB of deduped data require 6-8GB of RAM (using multi hundred GB files, so 128K record sizes, with smaller record sizes the ram numbers balloon). Which means even a low dedup ratio of 3 it is still cheaper to just buy more disks than find a system that can hold enough ram to dedup 4-5TB of data (64-128GB of RAM). Of course keep in mind we at $work only have practical knowledge of systems up to 4 TB of deduped data. We are planning some 45+TB and plan on using compression=gzip-9 and dedup=off. As far as the OP is concerned, unless you have a dataset that will dedup well don't bother with it, use compression instead (don't use both compression and dedup because you will shrink the average record size and balloon the memory usage). As mentioned by Ned Harvey, dedup of encrypted data is probably completely useless (it depends on the chaining mode, duplicate data in the encrypted dataset may be encrypted the same way allow for dedup of the encrypted data stream). However, if the encryption is done below zfs, like what GELI would be doing by giving the encrypted block device to zfs, then the use of dedup reverts back the the standard, "are you going to have enough duplicate data to get a ratio that is high enough to be worth the RAM requirements?" question. As I reread the OP, to make sure that the OPs question is actually answered; use of encryption will not affect zfs integrity beyond normal issues associated with zfs integrity and disk encryption. That is, missing encryption keys is like a failed disk (if below zfs) or any other file on a zfs filesystem (if above zfs). Zfs integrity (checksumming and the transactions) are not affected by the underlying block device (if encryption is below zfs) or the contents of a file on the system system (if encryption is above zfs). The "raid" features of zfs also work no differently, 'n' encrypted block devices is no different that 'n' plain hard drives as far as ZFS is concerned. _______________________________________________ zfs-discuss mailing list firstname.lastname@example.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss