[
https://issues.apache.org/jira/browse/JAMES-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411930#comment-17411930
]
Benoit Tellier edited comment on JAMES-3150 at 9/8/21, 1:31 PM:
----------------------------------------------------------------
Here is some return over experience for the BloomFilter scanning algorithm
Run 1: many deletions, listing batch size of 1.000 blobs at a time
25 hours for 35M deletions, 67M blobs.
{code:java}
{
"additionalInformation": {
"type": "BlobGCTask",
"timestamp": "2021-09-08T08:14:34.883206Z",
"referenceSourceCount": 49035372,
"blobCount": 67546056,
"gcedBlobCount": 34933325,
"errorCount": 0,
"bloomFilterExpectedBlobCount": 100000000,
"bloomFilterAssociatedProbability": 0.02
},
"status": "completed",
"taskId": "4b981c40-0d1f-4c9a-9bf9-d0aae5779647",
"startedDate": "2021-09-07T07:50:05.648+0000",
"completedDate": "2021-09-08T08:14:35.038+0000",
"executedOn": "james-jmap-988b8f869-cwxwn",
"submittedFrom": "james-jmap-988b8f869-cwxwn",
"cancelledFrom": null,
"submitDate": "2021-09-07T07:50:05.576+0000",
"type": "BlobGCTask"
}
{code}
Run 2: Few deletes, listing batch size of 10.000 blobs at a time
2 hours for 32 million blobs, 3251 deletions
{code:java}
{
"additionalInformation": {
"type": "BlobGCTask",
"timestamp": "2021-09-08T12:46:59.008272Z",
"referenceSourceCount": 49035372,
"blobCount": 32612766,
"gcedBlobCount": 3251,
"errorCount": 0,
"bloomFilterExpectedBlobCount": 67546056,
"bloomFilterAssociatedProbability": 0.02
},
"status": "completed",
"type": "BlobGCTask",
"taskId": "01dd426c-7c03-467e-a25f-5426b618773b",
"startedDate": "2021-09-08T10:49:58.916+0000",
"completedDate": "2021-09-08T12:46:59.055+0000",
"executedOn": "james-jmap-84bb8c66c5-qsdpf",
"submittedFrom": "james-imap-smtp-c4fdffbdd-vwffh",
"cancelledFrom": null,
"submitDate": "2021-09-08T10:49:58.762+0000"
}
{code}
We will run a *third run* tomorrow with 1.000 blob listing batch size expecting
no deletes, it will allow to discriminate which factor caused the run to be
slow, the small page size or the deletions.
We could also plan a *fourth run*, exploring if further increasing the blob
listing batch size further improves performance.
was (Author: btellier):
Run 1: many deletions, listing batch size of 1.000 blobs at a time
25 hours for 35M deletions, 67M blobs.
{code:java}
{
"additionalInformation": {
"type": "BlobGCTask",
"timestamp": "2021-09-08T08:14:34.883206Z",
"referenceSourceCount": 49035372,
"blobCount": 67546056,
"gcedBlobCount": 34933325,
"errorCount": 0,
"bloomFilterExpectedBlobCount": 100000000,
"bloomFilterAssociatedProbability": 0.02
},
"status": "completed",
"taskId": "4b981c40-0d1f-4c9a-9bf9-d0aae5779647",
"startedDate": "2021-09-07T07:50:05.648+0000",
"completedDate": "2021-09-08T08:14:35.038+0000",
"executedOn": "james-jmap-988b8f869-cwxwn",
"submittedFrom": "james-jmap-988b8f869-cwxwn",
"cancelledFrom": null,
"submitDate": "2021-09-07T07:50:05.576+0000",
"type": "BlobGCTask"
}
{code}
Run 2: Few deletes, listing batch size of 10.000 blobs at a time
2 hours for 32 million blobs, 3251 deletions
{code:java}
{
"additionalInformation": {
"type": "BlobGCTask",
"timestamp": "2021-09-08T12:46:59.008272Z",
"referenceSourceCount": 49035372,
"blobCount": 32612766,
"gcedBlobCount": 3251,
"errorCount": 0,
"bloomFilterExpectedBlobCount": 67546056,
"bloomFilterAssociatedProbability": 0.02
},
"status": "completed",
"type": "BlobGCTask",
"taskId": "01dd426c-7c03-467e-a25f-5426b618773b",
"startedDate": "2021-09-08T10:49:58.916+0000",
"completedDate": "2021-09-08T12:46:59.055+0000",
"executedOn": "james-jmap-84bb8c66c5-qsdpf",
"submittedFrom": "james-imap-smtp-c4fdffbdd-vwffh",
"cancelledFrom": null,
"submitDate": "2021-09-08T10:49:58.762+0000"
}
{code}
We will run a *third run* tomorrow with 1.000 blob listing batch size expecting
no deletes, it will allow to discriminate which factor caused the run to be
slow, the small page size or the deletions.
We could also plan a *fourth run*, exploring if further increasing the blob
listing batch size further improves performance.
> Implement Garbage Colletion for blobs
> -------------------------------------
>
> Key: JAMES-3150
> URL: https://issues.apache.org/jira/browse/JAMES-3150
> Project: James Server
> Issue Type: Improvement
> Components: Blob
> Affects Versions: 3.3.0
> Reporter: Gautier DI FOLCO
> Priority: Major
> Time Spent: 8.5h
> Remaining Estimate: 0h
>
> With the blob store deduplication, dropping a blob in a distributed
> environment is impossible if we want to keep an acceptable concurrency level.
> A Garbage Collector should be created in order to drop old blobs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]