[EMAIL PROTECTED] wrote:
> 
> [EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM:
> 
>>
>>> Does anyone know a tool that can look over a dataset and give
>>> duplication statistics? I'm not looking for something incredibly
>>> efficient but I'd like to know how much it would actually benefit our
>> Check out the following blog..:
>>
>> http://blogs.sun.com/erickustarz/entry/how_dedupalicious_is_your_pool
> 
> Just want to add,  while this is ok to give you a ballpark dedup number --
> fletcher2 is notoriously collision prone on real data sets.  It is meant to
> be fast at the expense of collisions.  This issue can show much more dedup
> possible than really exists on large datasets.

Doing this using sha256 as the checksum algorithm would be much more 
interesting.  I'm going to try that now and see how it compares with 
fletcher2 for a small contrived test.

-- 
Darren J Moffat
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to