[zfs-code] De-duplication - Similar Blocks at Different Offsets

Rajkumar M Fri, 05 Mar 2010 17:13:05 +0530

For data de-duplication, fixed-length block approach divides a file into 
fixed size length blocks to find duplicates. But similar data blocks may 
be present at different offsets in two different datasets. In other 
words the block boundary of similar data may be different. This is very 
common when some bytes are inserted in a file, and when the changed file 
is processed again for dedup, all the blocks appear to have changed. See 
http://www.mediafire.com/imageview.php?quickkey=qrvz1yhoima 
<http://www.google.com/url?sa=D&q=http://www.mediafire.com/imageview.php%3Fquickkey%3Dqrvz1yhoima&usg=AFQjCNHTH3201Ttm8w87MEP3BRauztD_nA>
 
for illustration.
Does ZFS de-duplication work if the offset of similar data blocks are 
different?

[zfs-code] De-duplication - Similar Blocks at Different Offsets

Reply via email to