Tim Spriggs wrote:
> Does anyone know a tool that can look over a dataset and give 
> duplication statistics? I'm not looking for something incredibly 
> efficient but I'd like to know how much it would actually benefit our 
> dataset: HiRISE has a large set of spacecraft data (images) that could 
> potentially have large amounts of redundancy, or not. Also, other up and 
> coming missions have a large data volume that have a lot of duplicate 
> image info and a small budget; with "d11p" in OpenSolaris there is a 
> good business case to invest in Sun/OpenSolaris rather than buy the 
> cheaper storage (+ linux?) that can simply hold everything as is.
>
> If someone feels like coding a tool up that basically makes a file of 
> checksums and counts how many times a particular checksum get's hit over 
> a dataset, I would be willing to run it and provide feedback. :)
>
> -Tim
>
>   

Me too.  Our data profile is just like Tim's: Terra bytes of satellite 
data.  I'm going to guess that the d11p ratio won't be fantastic for 
us.  I sure would like to measure it though.

Jon

-- 


-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 [EMAIL PROTECTED]
- ______/    ______/    ______/           AST:7731^29u18e3
                                 


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to