Hello server devs, While auditing a slow Cassandra on a performance test environment, I noticed ~25% of the data to be garbage of the Cassandra projection for the RabbitMQ mailqueue, as the following tables stats demonstrates:
Table: enqueuedmailsv3 SSTable count: 327 Space used (live): 4962189078 Space used (total): 4962189078 Space used by snapshots (total): 0 Off heap memory used (total): 4716757 SSTable Compression Ratio: 0.33271449206498704 Number of partitions (estimate): 6246 Table: deletedmailsv2 SSTable count: 69 Space used (live): 1132247647 Space used (total): 1132247647 Space used by snapshots (total): 0 Off heap memory used (total): 28743224 SSTable Compression Ratio: 0.5380381348994696 Number of partitions (estimate): 17669157 We take up to 6 GB for an empty mail queue. A bit of cleanup would be welcome. The following document presents the design of the RabbitMQ mailqueue: https://github.com/apache/james-project/blob/master/src/adr/0031-distributed-mail-queue.md The following document presents the design that solves that the aforementioned issue but was sadly never implemented... https://github.com/apache/james-project/blob/master/src/adr/0032-distributed-mail-queue-cleanup.md This also means people having dedplication turned off never deletes associated blobs. I will fire a PR updating the status of this ADR. This ADR will end up on Linagora's short-middle term TODO list. Cheers, Benoit --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
