This is an automated email from the ASF dual-hosted git repository.

btellier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit c4d5620ecddceec23f499c42f9829265580502bb
Author: RĂ©mi KOWALSKI <[email protected]>
AuthorDate: Tue Feb 18 13:43:30 2020 +0100

    [ADR] Distributed blob garbage collector
---
 src/adr/0029-distributed-blob-garbage-collector.md | 450 +++++++++++++++++++++
 1 file changed, 450 insertions(+)

diff --git a/src/adr/0029-distributed-blob-garbage-collector.md 
b/src/adr/0029-distributed-blob-garbage-collector.md
new file mode 100644
index 0000000..36b3bdc
--- /dev/null
+++ b/src/adr/0029-distributed-blob-garbage-collector.md
@@ -0,0 +1,450 @@
+# 29. Distributed blob garbage collector
+
+Date: 2020-02-18
+
+## Status
+
+Proposed
+
+## Context
+
+The body, headers, attachments of the mails are stored as blobs in a blob 
store.
+In order to save space in those stores, those blobs are de-duplicated using a 
hash of their content.
+To attain that the current blob store will read the content of the blob before 
saving it, and generate its id based on
+a hash of this content. This way two blobs with the same content will share 
the same id and thus be saved only once.
+This makes the safe deletion of one of those blobs a non trivial problem as we 
can't delete one blob without ensuring
+that all references to it are themselves deleted. For example if two messages 
share the same blob, when we delete
+one message there is at the time being no way to tell if the blob is still 
referenced by another message.
+
+## Decision
+
+To address this issue, we propose to implement a distributed blob garbage 
collector built upon the previously developed
+Distributed Task Manager.
+The de-duplicating blob store will keep track of the references pointing 
toward a blob in a `References` table.
+It will also keep track of the deletion requests for a blob in a `Deletions` 
table.
+When the garbage collector algorithm runs it will fetch from the `Deletions` 
table the blobs considered to be effectively deleted,
+and will check in the `References` table if there are still some references to 
them. If there is no more reference to a blob,
+it will be effectively deleted from the blob store.
+
+To avoid concurrency issues, where we could garbage collect a blob at the same 
time a new reference to it appear,
+a `reference generation` notion will be added. The de-duplicating id of the 
blobs which before where constructed
+using only the hash of their content,  will now include this `reference 
generation` too.
+At a given interval a new `reference generation` will be emitted, since then 
all new blobs will point to this new generation.
+
+So a `garbage collection iteration` will run only on the `reference 
generation` `n-2` to avoid concurrency issues.
+
+The switch of generation will be triggered by a task running on the 
distributed task manager. This task will
+emit an event into the event sourcing system to increment the `reference 
generation`.
+
+
+## Alternatives
+
+Not de-duplicating the blobs' content, this simple approach which involves 
storing the same
+blob a lot of times can in some scenario be really slow and costly. Albeit it 
can in some case be preferred for the sake of
+simplicity, data security...
+
+## Consequences
+
+This change will necessitate to extract the base blob store responsibilities 
(store a blob, delete a blob, read a blob)
+from the current blob store implementation which is doing the de-duplication, 
id generation...
+The garbage collector will use this low level blob store in order to 
effectively delete the blobs.
+
+One other consequence of this work, is the fact that there will be no  
de-duplication on different `reference generation`,
+i.e two blobs with the same content will be stored twice now, if they were 
created during two different `reference generation`.
+
+When writing a blob into the de-duplicating blob store, we will need to 
specify the reference to the object (MessageId, AttachmentId...) we
+store the blob for. This can make some components harder to implement as we 
will have to propagate the references.
+
+Since we will not build a distributed task scheduler. To increment the 
`reference generation` and launch periodically a
+`garbage collection iteration`, the scheduling will be done by an external 
scheduler (cron job, kubernetes cronjob ...)
+ which will call a webadmin endpoint to launch this task periodically.
+
+## Algorithm visualisation
+
+### Generation 1 and Iteration 1
+
+ * Events
+   * `rg1` reference generation is emitted
+   * `gci1` garbage collection iteration is emitted
+   * An email is sent to `user1`, a `m1` message, and a blob `b1` are stored 
with `rg1`
+   * An email is sent to `user1` and `user2`, `m2` and `m3` messages, and a 
blob `b2` are stored with `rg1`
+
+#### Tables
+
+##### Generations
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b1      | rg1                     |
+| b2      | rg1                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m1         | b1      | rg1                     |
+| m2         | b2      | rg1                     |
+| m3         | b2      | rg1                     |
+
+##### Deletions
+
+Empty
+
+### Generation 2 / Iteration 2
+
+ * Events
+   * `rg2` reference generation is emitted
+   * `gci2` garbage collection iteration is emitted
+   * An email is sent to `user1`, a `m4` message, and a blob `b3` are stored 
with `rg2`
+   * An email is sent to `user1` and `user2`, `m5` and `m6` messages, and a 
blob `b4` are stored with `rg2`
+
+#### Tables
+
+##### Generations
+
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b1      | rg1                     |
+| b2      | rg1                     |
+| b3      | rg2                     |
+| b4      | rg2                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m1         | b1      | rg1                     |
+| m2         | b2      | rg1                     |
+| m3         | b2      | rg1                     |
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+
+##### Deletions
+
+Empty
+
+### Generation 3 / Iteration 3
+
+ * Events
+   * `rg3` reference generation is emitted
+   * `gci3` garbage collection iteration is emitted
+   * An email is sent to `user1`, a `m7` message, and a blob `b5` are stored 
with `rg3`
+   * An email is sent to `user1` and `user2`, `m8` and `m9` messages, and a 
blob `b6` are stored with `rg3`
+   * `user1` deletes `m1`, `m2`, `m7`, and `m8` with `gi3`
+   * `user2` deletes `m3` with `gi3`
+
+#### Tables: before deletions
+
+##### Generations
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b1      | rg1                     |
+| b2      | rg1                     |
+| b3      | rg2                     |
+| b4      | rg2                     |
+| b5      | rg3                     |
+| b6      | rg3                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m1         | b1      | rg1                     |
+| m2         | b2      | rg1                     |
+| m3         | b2      | rg1                     |
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+| m7         | b5      | rg3                     |
+| m8         | b6      | rg3                     |
+| m9         | b6      | rg3                     |
+
+##### Deletions
+
+Empty
+
+
+#### Tables: after deletions
+
+
+##### Generations
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b1      | rg1                     |
+| b2      | rg1                     |
+| b3      | rg2                     |
+| b4      | rg2                     |
+| b5      | rg3                     |
+| b6      | rg3                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+| m9         | b6      | rg3                     |
+
+##### Deletions
+
+| blob id | reference generation id | date  | garbage collection iteration id |
+|---------|-------------------------|-------|---------------------------------|
+| b1      | rg1                     | 10:42 | gci3                            |
+| b2      | rg1                     | 10:42 | gci3                            |
+| b2      | rg1                     | 13:37 | gci3                            |
+| b5      | rg3                     | 10:42 | gci3                            |
+| b6      | rg3                     | 10:42 | gci3                            |
+
+#### Running the algorithm
+
+ * fetch `Deletions` for `gci3` in `deletions`
+ * find distinct `reference-generation-id` of `deletions` in `generations = 
{rg1, rg3}`
+ * For each generation
+   * *rg1*
+     * filter `deletions` to keep only `rg1` entries and extract `blob-ids` in 
`concernedBlobs = {b1, b2}`
+     * fetch all references to `concernedBlobs` and build a Bloom-Filter in 
`foundedReferences = {}`
+     * filter `concernedBlobs` to keep only those which are not present in 
`foundedReferences` in `blobsToDelete = {b1, b2}`
+     * Remove `blobsToDelete` from `Blobs` and `Deletions`
+   * *rg3*
+     * filter `deletions` to keep only `rg3` entries and extract `blob-ids` in 
`concernedBlobs = {b5, b6}`
+     * fetch all references to `concernedBlobs` and build a Bloom-Filter in 
`foundedReferences = {b6}`
+     * filter `concernedBlobs` to keep only those which are not present in 
`foundedReferences` in `blobsToDelete = {b5}`
+     * Remove `blobsToDelete` from `Blobs` and `Deletions`
+
+
+#### Tables: after garbage collection
+
+##### Generations
+
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b3      | rg2                     |
+| b4      | rg2                     |
+| b6      | rg3                     |
+
+##### References
+
+| message id | blob id | generation id |
+|------------|---------|---------------|
+| m4         | b3      | g2            |
+| m5         | b4      | g2            |
+| m6         | b4      | g2            |
+| m9         | b6      | g3            |
+
+##### Deletions
+
+| blob id | reference generation id | date  | garbage collection iteration id |
+|---------|-------------------------|-------|---------------------------------|
+| b6      | rg3                     | 10:42 | gci3                            |
+
+### Generations 4
+
+ * Events
+   * `rg4` reference generation is emitted
+   * `gci4` garbage collection iteration is emitted
+   * `user2` deletes `m9` with `gcg4`
+
+#### Tables: before deletions
+
+##### Generations
+
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+| rg4                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+| gci4                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b3      | rg2                     |
+| b4      | rg2                     |
+| b6      | rg3                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+| m9         | b6      | rg3                     |
+
+##### Deletions
+
+| blob id | reference generation id | date  | garbage collection iteration id |
+|---------|-------------------------|-------|---------------------------------|
+| b6      | rg3                     | 10:42 | gci3                            |
+
+#### Tables: after deletions
+
+##### Generations
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+| rg4                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+| gci4                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b3      | rg2                     |
+| b4      | rg2                     |
+| b6      | rg3                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+
+##### Deletions
+
+| blob id | reference generation id | date  | garbage collection iteration id |
+|---------|-------------------------|-------|---------------------------------|
+| b6      | rg3                     | 10:42 | gci3                            |
+| b6      | rg3                     | 18:42 | gci4                            
|                  |
+
+#### Running the algorithm
+
+ * fetch `Deletions` for `gci4` in `deletions`
+ * find distinct `generation-id` of `deletions` in `generations = {rg3}`
+ * For each generation
+   * *rg3*
+     * filter `deletions` to keep only `rg3` entries and extract `blob-ids` in 
`concernedBlobs = {b6}`
+     * fetch all references to `concernedBlobs` and build a Bloom-Filter in 
`foundedReferences = {}`
+     * filter `concernedBlobs` to keep only those which are not present in 
`foundedReferences` in `blobsToDelete = {b6}`
+     * Remove `blobsToDelete` from `Blobs` and `Deletions`
+
+
+#### Tables: after garbage collection
+
+##### Generations
+
+
+| reference generation id |
+|-------------------------|
+| rg1                     |
+| rg2                     |
+| rg3                     |
+| rg4                     |
+
+| garbage collection iteration id |
+|---------------------------------|
+| gci1                            |
+| gci2                            |
+| gci3                            |
+| gci4                            |
+
+##### Blobs
+
+| blob id | reference generation id |
+|---------|-------------------------|
+| b3      | rg2                     |
+| b4      | rg2                     |
+
+##### References
+
+| message id | blob id | reference generation id |
+|------------|---------|-------------------------|
+| m4         | b3      | rg2                     |
+| m5         | b4      | rg2                     |
+| m6         | b4      | rg2                     |
+
+##### Deletions
+
+Empty
+
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to