So ... The way things presently are, ideally you would know in advance what 
stuff you were planning to write that has duplicate copies.  You could enable 
dedup, then write all the stuff that's highly duplicated, then turn off dedup 
and write all the non-duplicate stuff.  Obviously, however, this is a fairly 
implausible actual scenario.

In reality, while you're writing, you're going to have duplicate blocks mixed 
in with your non-duplicate blocks, which fundamentally means the system needs 
to be calculating the cksums and entering into DDT, even for the unique 
blocks...  Just because the first time the system sees each duplicate block, it 
doesn't yet know that it's going to be duplicated later.

But as you said, after data is written, and sits around for a while, the 
probability of duplicating unique blocks diminishes over time.  So they're just 
a burden.

I would think, the ideal situation would be to take your idea of un-dedup for 
unique blocks, and take it a step further.  Un-dedup unique blocks that are 
older than some configurable threshold.  Maybe you could have a command for a 
sysadmin to run, to scan the whole pool performing this operation, but it's the 
kind of maintenance that really should be done upon access, too.  Somebody goes 
back and reads a jpg from last year, system reads it and consequently loads the 
DDT entry, discovers that it's unique and has been for a long time, so throw 
out the DDT info.

But, by talking about it, we're just smoking pipe dreams.  Cuz we all know zfs 
is developmentally challenged now.  But one can dream...

finglonger


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to