So ... The way things presently are, ideally you would know in advance what
stuff you were planning to write that has duplicate copies. You could enable
dedup, then write all the stuff that's highly duplicated, then turn off dedup
and write all the non-duplicate stuff. Obviously, however, this is a fairly
implausible actual scenario.
In reality, while you're writing, you're going to have duplicate blocks mixed
in with your non-duplicate blocks, which fundamentally means the system needs
to be calculating the cksums and entering into DDT, even for the unique
blocks... Just because the first time the system sees each duplicate block, it
doesn't yet know that it's going to be duplicated later.
But as you said, after data is written, and sits around for a while, the
probability of duplicating unique blocks diminishes over time. So they're just
I would think, the ideal situation would be to take your idea of un-dedup for
unique blocks, and take it a step further. Un-dedup unique blocks that are
older than some configurable threshold. Maybe you could have a command for a
sysadmin to run, to scan the whole pool performing this operation, but it's the
kind of maintenance that really should be done upon access, too. Somebody goes
back and reads a jpg from last year, system reads it and consequently loads the
DDT entry, discovers that it's unique and has been for a long time, so throw
out the DDT info.
But, by talking about it, we're just smoking pipe dreams. Cuz we all know zfs
is developmentally challenged now. But one can dream...
zfs-discuss mailing list