This is an automated email from the ASF dual-hosted git repository.

btellier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit 3a8c5beeba4a02e1c437487a8d48ea5603e378d9
Author: Benoit Tellier <[email protected]>
AuthorDate: Mon Apr 13 12:12:41 2020 +0700

    [ADR] Distributed Mail Queue Cleanup
---
 src/adr/0032-distributed-mail-queue-cleanup.md | 53 ++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/src/adr/0032-distributed-mail-queue-cleanup.md 
b/src/adr/0032-distributed-mail-queue-cleanup.md
new file mode 100644
index 0000000..5f2202f
--- /dev/null
+++ b/src/adr/0032-distributed-mail-queue-cleanup.md
@@ -0,0 +1,53 @@
+# 32. Distributed Mail Queue Cleanup
+
+Date: 2020-04-13
+
+## Status
+
+Proposed
+
+## Context
+
+Read [Distributed Mail Queue](0031-distributed-mail-queue.md) for full context.
+
+**enqueuedMailsV3** and **deletedMailsV2** is never cleaned up and the 
corresponding blobs are always referenced. This is not
+ideal both from a privacy and space storage costs point of view.
+
+Note that **enqueuedMailsV3** and **deletedMailsV2** rely on 
timeWindowCompactionStrategy.
+
+## Decision
+
+Add a new `contentStart` table referencing the point in time from which a 
given mailQueue holds data, for each mail queue.
+
+The values contained between `contentStart` and `browseStart` can safely be 
deleted.
+
+We can perform this cleanup upon `browseStartUpdate`: once finished we can 
browse then delete content of **enqueuedMailsV3**
+and **deletedMailsV2** contained between `contentStart` and the new 
`browseStart` then we can safely set `contentStart` 
+to the new `browseStart`.
+
+Content before `browseStart` can safely be considered deletable, and is 
applicatively no longer exposed. We don't need an
+additional grace period mechanism for `contentStart`.
+
+Failing cleanup will lead to the content being eventually updated upon next 
`browseStart` update.
+
+We will furthermore delete blobStore content upon dequeue, also when the mail 
had been deleted or purged via MailQueue
+management APIs.
+
+## Consequences
+
+All Cassandra SSTable before `browseStart` can safely be dropped as part of 
the timeWindowCompactionStrategy.
+
+Updating browse start will then be two times more expensive as we need to 
unreference passed slices.
+
+Eventually this will allow reclaiming Cassandra disk space and enforce mail 
privacy by removing dandling metadata.
+
+## Alternative
+
+A 
[proposal](https://github.com/linagora/james-project/pull/3291#pullrequestreview-393501339)
 was made to piggy back 
+cleanup upon dequeue/delete operations. The dequeuer/deleter then directly 
removes the related metadata from 
+`enqueuedMailsV3` and `deletedMailsV2`. This simpler design however have 
several flaws:
+
+ - if the cleanup fails for any reason then it cannot be retried in the 
future. There will be no way of cleaning up the 
+ related data.
+ - this will end up tumbstoning live slices potentially harming 
browse/delete/browse start updates performance.
+ - this proposition don't leverage as efficiently timeWindowCompactionStrategy.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to