[ 
https://issues.apache.org/jira/browse/JAMES-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062510#comment-18062510
 ] 

Benoit Tellier commented on JAMES-4182:
---------------------------------------

> Neither is it where the metadata will be allocated 

Ain't allocating content-encoding:zstd the entire point of this metadata axis, 
with a BlobStoreDAP responsible of it?

BlobId -> allocated by the BlobStore
metadata -> added as needed by the blobStoreDao

> S3 object compression
> ---------------------
>
>                 Key: JAMES-4182
>                 URL: https://issues.apache.org/jira/browse/JAMES-4182
>             Project: James Server
>          Issue Type: Improvement
>          Components: s3
>            Reporter: Benoit Tellier
>            Priority: Major
>
> h3. Why?
> As a James operator I want to do money savings on my S3 cloud bill.
> Operating a mail solution I notice 50 of the cloud cost is S3 storage.
> Garbage collection + data tiering is in place (attachment tiering for what s3 
> is concerned) but I want to have further mechanisms at hand to reduce the 
> bill.
> On the gains side:
>  - Attachment payload is mostly compressed - this is mitigated by attachment 
> tiering
>  - Mime is base 64 encoded - we can expect a minimum compression ratio of 30%
>  - On LINAGORA workload I did exhibit a compression ratio of ~0.55
> h3. What?
> Optional ZSTD compression in James ObjectStores.
> I wishes a fully retro-compatible mechanisms that only uncompress compressed 
> data, ideally using `content-encoding` metadata on the s3 object as a 
> compression marker.
> Also I wishes:
>  - A size threshold (16KB by default)
>  - A minimum compression ratio (defaulting to 1 - only compress if meaningful)
> In `blob.properties`:
> {code:java}
> compression.enabled=true
> compression.size.threshold=16K
> compression.min.ratio=0.8
> {code}
> I propose to implement this directly onto the s3BlobStoreDAO. Sadly the 
> BlobStoreDAO abstraction misses the needed abstractions to have metadata to 
> know if the data had been compressed or not.
> h3. Bringing the design even further
> What I will actually *really* do is :
>  - never compress in James
>  - leverage an external java code to compress data of older generations (> 1 
> month)
>  - and have james serve transparently older compress data
> With this:
>  - Most of the data is stored compressed - massive storage gains.
>  - Data compression is fully asynchronous
>  - 90% minimum of the read traffic is served uncompressed
> h3. Alternatives
> The most controversial part of the proposal is not to do this by composing 
> the BlobStoreDAO - I actually misses a `metadata` concept to do this.
> We *could* bring this metadata (Map<String, String>) to the BlobStoreDAO
> I would propose something like this for the data model:
> {code:java}
> sealed interface Blob {
>     Map<String, String> metadata();
>     // Have the POJOs encode some conversions ?
>     InputStreamBlob payloadAsStream();
>     ByteBlob asBytes();
>     ByteSourceBlob asByteSource(); 
> }
> record BytesBlob {...}
> record InputStreamBlob {...}
> record ByteSourceBlob {...}
> record StringBlob {...}
> record ByteFluxBlox {...} // Flux<ByteBuffer>
> {code}
> We could then refine the BlobStroeDAO interface:
> {code:java}
> public interface BlobStoreDAO {
>     // implementations to pattern match on data!
>     Publisher<Void> save(BucketName bucketName, BlobId blobId, Blob data); 
>     Publisher<BytesBlob> readBytes(BucketName bucketName, BlobId blobId);
>     Publisher<InputStreamBlob> readReactive(BucketName bucketName, BlobId 
> blobId);
>     InputStreamBlob read(BucketName bucketName, BlobId blobId);
>     // delete* + list* methods unchanged
> } 
> {code}
> Please note that implems that do not support metadata (file, cassandra) shall 
> THROW.
> Upsides:
>  - independant from S3: we do not make s3 code more complex we compose over it
>  - independant from s3: we actually could reuse this for other blob stores 
> (if any)
>  - benefit of it for encryption: to benefit from compression, we need to 
> compress then encrypt. encrypt then compress yield zero benefit. By having 
> compression a s3 concern we would be forced to encrypt then compress.
> Downside: major refactoring needed...
> Opinions?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to