agreed.

while a bitwise check is the only assured way to determine duplicative nature 
of two blocks, if the check were done in a streaming method as you suggest, 
performance, while a huge impact compared to not, would be more than bearable 
if used within an environment with large known levels of duplicative data, such 
as a large central backup zfs send target.   the checksum metadata is sent 
first, then the data, while the receiving system is checking it's db for 
possible dupe and if found, reads the data from local disks and compares to 
data as it is coming from sender.  If it gets to the end and hasn't found a 
difference, updates the pointer for the block to point to the duplicate.  This 
won't save any bandwidth during the backup, but will save on-disk space and 
given the application, could be very advantagous.

thank you for the insightful discussion on this.   within the electronic 
discovery and records and information management space data deduplication and 
policy-based aging are the foremost topics of the day but this is at the file 
level while block-level deduplication would lend no benefit to that regardless.

-=dave
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to