[s3ql] Optimizing deduplication

cschlick Wed, 15 Aug 2018 14:09:35 -0700

Hi, I am interested in learning a bit more about the architecture, 
specifically what is the order of operations between; splitting data into 
chunks, compression, deduplication, and encryption?


I have a large amount of data that is very similar, but each file is only 
somewhat compressible. My concern is that compression might have a negative 
effect on the potential for efficient deduplication... is this possible? 
Also, is the compression done separately on each chunk, or on a per-file 
basis? Finally, will smaller chunks (max-file-size) produce better 
deduplication (just at the expense of more network operations)? For what 
it's worth, my data is mostly 3D numerical arrays where each file is a few 
GB. 

Thanks so much. Chris

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[s3ql] Optimizing deduplication

Reply via email to