Hello server devs,

While auditing a slow Cassandra on a performance test environment, I
noticed ~25% of the data to be garbage of the Cassandra projection for
the RabbitMQ mailqueue, as the following tables stats demonstrates:

        Table: enqueuedmailsv3
        SSTable count: 327
        Space used (live): 4962189078
        Space used (total): 4962189078
        Space used by snapshots (total): 0
        Off heap memory used (total): 4716757
        SSTable Compression Ratio: 0.33271449206498704
        Number of partitions (estimate): 6246

        Table: deletedmailsv2
        SSTable count: 69
        Space used (live): 1132247647
        Space used (total): 1132247647
        Space used by snapshots (total): 0
        Off heap memory used (total): 28743224
        SSTable Compression Ratio: 0.5380381348994696
        Number of partitions (estimate): 17669157

We take up to 6 GB for an empty mail queue. A bit of cleanup would be
welcome.

The following document presents the design of the RabbitMQ mailqueue:
https://github.com/apache/james-project/blob/master/src/adr/0031-distributed-mail-queue.md

The following document presents the design that solves that the
aforementioned issue but was sadly never implemented...
https://github.com/apache/james-project/blob/master/src/adr/0032-distributed-mail-queue-cleanup.md

This also means people having dedplication turned off never deletes
associated blobs.

I will fire a PR updating the status of this ADR. This ADR will end up
on Linagora's short-middle term TODO list.

Cheers,

Benoit


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to