Re: [s3ql] Optimizing deduplication

Nikolaus Rath Thu, 16 Aug 2018 01:23:09 -0700

On Aug 15 2018, [email protected] wrote:
> Hi, I am interested in learning a bit more about the architecture, 
> specifically what is the order of operations between; splitting data into 
> chunks, compression, deduplication, and encryption?


1. Split
2. Deduplication
3. Compression
4. Encryption

> I have a large amount of data that is very similar, but each file is only 
> somewhat compressible. My concern is that compression might have a negative 
> effect on the potential for efficient deduplication... is this
> possible?

Not unless you do the splitting into blocks after the compression.

> Also, is the compression done separately on each chunk, or on a per-file 
> basis?

Per chunk.

> Finally, will smaller chunks (max-file-size) produce better 
> deduplication (just at the expense of more network operations)?

Whether it will result in more de-duplication depends on your data, but
it will certainly not result in less. It will, however, also increase
the size of your metadata DB.

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [s3ql] Optimizing deduplication

Reply via email to