On 2013-02-04 17:10, Karl Wagner wrote:
OK then, I guess my next question would be what's the best way to
"undedupe" the data I have?
Would it work for me to zfs send/receive on the same pool (with dedup
off), deleting the old datasets once they have been 'copied'? I think I
remember reading somewhere that the DDT never shrinks, so this would not
work, but it would be the simplest way.
Otherwise, I would be left with creating another pool or destroying and
restoring from a backup, neither of which is ideal.
If you have enough space, then copying with dedup=off should work
(zfs send, rsync, whatever works for you best).
I think DDT should shrink, deleting entries as soon as their reference
count goes to 0, however this by itself can take quite a while and
cause lots of random IO - in my case this might have been reason for
system hangs and/or panics due to memory starvation. However, after
a series of reboots (and a couple of weeks of disk-thrashing) I was
able to get rid of some more offending datasets in my tests a couple
of years ago now...
As for smarter "undedup" - I've asked recently, proposing a "method"
to do it in a stone-age way; but overall there is no ready solution
zfs-discuss mailing list