[
https://issues.apache.org/jira/browse/JAMES-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062504#comment-18062504
]
Jean Helou commented on JAMES-4182:
-----------------------------------
> is not where the blobId is allocated
Neither is it where the metadata will be allocated :)
> Here you are https://github.com/apache/james-project/pull/2960
thanks
> S3 object compression
> ---------------------
>
> Key: JAMES-4182
> URL: https://issues.apache.org/jira/browse/JAMES-4182
> Project: James Server
> Issue Type: Improvement
> Components: s3
> Reporter: Benoit Tellier
> Priority: Major
>
> h3. Why?
> As a James operator I want to do money savings on my S3 cloud bill.
> Operating a mail solution I notice 50 of the cloud cost is S3 storage.
> Garbage collection + data tiering is in place (attachment tiering for what s3
> is concerned) but I want to have further mechanisms at hand to reduce the
> bill.
> On the gains side:
> - Attachment payload is mostly compressed - this is mitigated by attachment
> tiering
> - Mime is base 64 encoded - we can expect a minimum compression ratio of 30%
> - On LINAGORA workload I did exhibit a compression ratio of ~0.55
> h3. What?
> Optional ZSTD compression in James ObjectStores.
> I wishes a fully retro-compatible mechanisms that only uncompress compressed
> data, ideally using `content-encoding` metadata on the s3 object as a
> compression marker.
> Also I wishes:
> - A size threshold (16KB by default)
> - A minimum compression ratio (defaulting to 1 - only compress if meaningful)
> In `blob.properties`:
> {code:java}
> compression.enabled=true
> compression.size.threshold=16K
> compression.min.ratio=0.8
> {code}
> I propose to implement this directly onto the s3BlobStoreDAO. Sadly the
> BlobStoreDAO abstraction misses the needed abstractions to have metadata to
> know if the data had been compressed or not.
> h3. Bringing the design even further
> What I will actually *really* do is :
> - never compress in James
> - leverage an external java code to compress data of older generations (> 1
> month)
> - and have james serve transparently older compress data
> With this:
> - Most of the data is stored compressed - massive storage gains.
> - Data compression is fully asynchronous
> - 90% minimum of the read traffic is served uncompressed
> h3. Alternatives
> The most controversial part of the proposal is not to do this by composing
> the BlobStoreDAO - I actually misses a `metadata` concept to do this.
> We *could* bring this metadata (Map<String, String>) to the BlobStoreDAO
> I would propose something like this for the data model:
> {code:java}
> sealed interface Blob {
> Map<String, String> metadata();
> // Have the POJOs encode some conversions ?
> InputStreamBlob payloadAsStream();
> ByteBlob asBytes();
> ByteSourceBlob asByteSource();
> }
> record BytesBlob {...}
> record InputStreamBlob {...}
> record ByteSourceBlob {...}
> record StringBlob {...}
> record ByteFluxBlox {...} // Flux<ByteBuffer>
> {code}
> We could then refine the BlobStroeDAO interface:
> {code:java}
> public interface BlobStoreDAO {
> // implementations to pattern match on data!
> Publisher<Void> save(BucketName bucketName, BlobId blobId, Blob data);
> Publisher<BytesBlob> readBytes(BucketName bucketName, BlobId blobId);
> Publisher<InputStreamBlob> readReactive(BucketName bucketName, BlobId
> blobId);
> InputStreamBlob read(BucketName bucketName, BlobId blobId);
> // delete* + list* methods unchanged
> }
> {code}
> Please note that implems that do not support metadata (file, cassandra) shall
> THROW.
> Upsides:
> - independant from S3: we do not make s3 code more complex we compose over it
> - independant from s3: we actually could reuse this for other blob stores
> (if any)
> - benefit of it for encryption: to benefit from compression, we need to
> compress then encrypt. encrypt then compress yield zero benefit. By having
> compression a s3 concern we would be forced to encrypt then compress.
> Downside: major refactoring needed...
> Opinions?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]