Re: [VOTE] Proposal to adopt ADR-76 Deleted message vault should use a single bucket

Benoit TELLIER Mon, 19 Jan 2026 07:05:00 -0800

Hello Jean

> each feature should use a unique blobstore bucket


+1 

This is followed by "deleted message vault" with this proposal but is followed 
by jmap uplads too.

> some features share similar storage properties/requirement andcould live in a 
> single s3 bucket 

This is the case for

 - mailbox  - RabbitMQ mail queue - Cassandra + Postgres mail repositories 
 - Attachments
They share the default bucket in order to be deduplicated.
This cross-feature dedulication is explicitly desired CF 
https://github.com/apache/james-project/blob/aef61d6fc9655698a8bff1521398ff110388e2d1/server/apps/distributed-app/src/test/java/org/apache/james/WithCassandraDeduplicationBlobStoreTest.java#L113
 

> The current choice is either a full s3 bucket per feature or a single s3> 
> bucket for all blobstore buckets.> - no blob.properties prefix, james uses 
> one top level buckets per feature> - blob.properties prefix : james stores 
> EVERY blob in a single bucket
To be faily honnest there's nothing harder to migrate than a S3 naming layout, 
I'm reluctant to changes affecting it.

Furthermore I believe we would be better served being more declarative and have 
each feature explicitly declare its buckets (and object name prefix?)
Changing the meaning of blob.properties objectstorage.bucketPrefix ( 
https://github.com/apache/james-project/blob/aef61d6fc9655698a8bff1521398ff110388e2d1/server/apps/distributed-app/sample-configuration/blob.properties#L66
 ) would create even more confusion on the topic and so adding yet another 
"prefix" concept in this file.

I understand operating off 3 buckets might not be ideal but it is likely to 
still be very manageable no?

-- 

Best regards,

Benoit TELLIER

General manager of Linagora VIETNAM.
Product owner for Twake-Mail product.
Chairman of the Apache James project.

Mail: [email protected]
Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)


Le janv. 19, 2026 12:16 PM, de Jean Helou <[email protected]>To sum up what my 
take of the discussions so far :

On the specific case of the deleted message vault, the proposed
implementation :

Introduce a logical bucket (Blobstore) prefix to the time based structure.
this allows storing all deleted messages in a single S3 bucket or combined
with the underlying S3 prefix capability having a single toplevel
"directory" for all these messages. All new messages will be written to the
new structure, both the new and old addressing schemes will be readable.
The old addressing scheme will be announced as deprecated but maintained
for 5 years ( ~ about 3 or 4 major releases ?) ensuring every user who
updates on a reasonable timescale will have a proper upgrade.
The name of the logical blobstore bucket will be configurable, enabling
users who don't use an S3 prefix to use the bucket of their choice for this
use case.

I agree this is better than the current implementation and vote in favor(
+1 ).
I think we should add comments to the fallkback specific code to flag the
expected date of removal in the code, I for one guarantee that I will
forget about it within such a long time span.

With regards to the ADR : as it is currently worded I stil vote against.
(-1) since the vote was initiated on both topics my compound vote must be
-1 at least as it is worded today,
My position is that :
- blobstore bucket and s3 bucket should not be conflated
- each feature should use a unique blobstore bucket
- how blobstore buckets are stored in s3 buckets should be configurable

I argue that some features share similar storage properties/requirement and
could live in a single s3 bucket with a storage class policy while other
features require different storage properties/requirements and could live
in different s3 bucket (s) with their own different storage class policy,
if we are going to have an ADR about bucket usage we should not ignore
future features. while the fixed set of 3 buckets may be acceptable today,
do we really want to force users to configure 10 different buckets in the
future.

The current choice is either a full s3 bucket per feature or a single s3
bucket for all blobstore buckets.
- no blob.properties prefix, james uses one top level buckets per feature
- blob.properties prefix : james stores EVERY blob in a single bucket

My proposal is to move forward with the implemetation of the deleted
message vault but reboot the ADR discussion.

Jean


Le ven. 16 janv. 2026 à 09:14, Rene Cordier <[email protected]> a écrit :

> Hello,
>
> Overall +1 on my side.
>
> Regarding the second topic, to add a few more details: technically the
> blobstore deleted message vault does not change much in the end. There
> was no new V2 implementation in the end (maybe likely should update the
> ADR regarding this point). We identified that we could go forward in the
> current vault code to add those modifications:
>
> - append is new, but we just write into the new single bucket so it does
> not collide with the current version. We keep the old code around just
> for testing (put the tag for it) the fallback, but it's not used anymore
> outside of tests.
>
> - read and delete: the code does not change! as we do not change the
> underlying design with metadatas, it actually works for old buckets and
> the new single one without code mifications there.
>
> - purge task: that's where there is an old code / new code mixed
> together. I did put a deprecated tag on the concerned old part. If we
> remove it in +2 releases for example that should be alright? Or even
> more. Honestly the fallback ain't too bad here. If you don't have old
> buckets the old purge code part will just do nothing!
>
> Hope this helps clarify some points :)
>
> Regards,
>
> Rene.
>
> On 1/15/26 22:59, Benoit TELLIER wrote:
> > Hello Jean
> >
> > Topic 1:
> >
> > I think an ADR structure rework discussion was started on the mailing
> list already.
> > Regarding this very example:
> >   - The principal of operating off a smal number of fixed buckets is
> likely an architecture decision.
> >     Worth recording in an ADR. It could decide to define an architecture
> principle.
> >     Then in the *context* we could list offending features and plugins
> and *consequences* list needed refactorings.
> >   - Having an ADR presenting why we have a plugin to fix a generic
> problem in the email world seems like a reasonnable choice to me. I often
> end up, in architecture discussions (and no later than yesterday!) to get
> discussion on this topic. Hence it feels justified to get a track record.
> >
> > Also this seems dangerous to me to say "not relevant, that's a plugin"
> as everything (the mailbox, the event-bus, etc) is somehow a plugin with
> guice binding and concern specific implementations and not others.
> >
> > Topic 2: Answers inlined.
> >
> > --
> >
> > Best regards,
> >
> > Benoit TELLIER
> >
> > General manager of Linagora VIETNAM.
> > Product owner for Twake-Mail product.
> > Chairman of the Apache James project.
> >
> > Mail: [email protected]
> > Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)
> >
> >
> > Le janv. 15, 2026 4:13 PM, de Jean Helou <[email protected]>Since you
> called a formal vote I must vote -1 I may change my vote after
> > the discussion has taken place.
> >
> > 2 topics on which I would like to see some discussion before voting :
> >
> > Topic 1 Why an ADR for an implementation detail in a plugin ?
> > Maybe I'm just too new to james and/or need a refresher in what james
> > considers and architectural decision.
> > It feels to me that this should be part of the README for the plugin. I
> see
> > that introducing the feature (the deleted message store) was also done
> > through an ADR which seems a bit suprising to me.
> > I dont disagree on the content of the files themselves (thought the ADR
> > formalism is a bit weird for a README), it just doesn't match with my
> > understanding of what ADRs are.
> > I personnally would prefer to merge the content of both files in a README
> > file on the plugin itself (to document the why and the implementation
> > details of the plugin) and list the plugin out somewhere in the
> > documentation ( it probably already is) to let people know the feature
> > exist and where to find details about it.
> >
> > On the plugin evolution itself
> > The currenty behaviour is to create a full s3 bucket following the
> pattern
> > `deleted-messages-[year]-[month]-01` (it is unclear to me what the 01
> > represents, is it the day ?)
> >
> > Means first day of the month and was adopted to get something that looks
> like a date if I recall well/
> >
> >
> >
> >
> > According to the change proposed in the "ADR", the plugin would now store
> > the same contents under a virtual path within a single bucker `
> > [year]/[month]/[blob_id]`
> > The proposed migration strategy is :
> > - - write only on the single bucket, fall back if necessary on old
> buckets
> > for read and delete
> > - - add the single bucket usage case to the purge task, that would do
> > cleaning on both new and old buckets.
> > I think we should work out right now for how long we intend to maintain
> the
> > fallback and old behaviour in the clean task
> >
> >
> > +1 typical retention is 1 year so going for 2 years seems reasonable.
> Likely a bit more to further ease the upgrade path.
> >
> > Probably also clearly communicate (changelog comes to mind) on the
> > deprecation of the old behaviour so we can eventually remove both the
> > fallback and the cleaning task.
> >
> > +1
> >
> > However this has consequences on users ugrade path. As proposed they will
> > have to install a version which has both the new behaviour and the
> fallback
> > for at least the retention period of all the deleted messages in their
> > system.
> >
> > Would you consider providing a migration tool that allows them to move
> > their deleted messages to the new scheme and fast forward on the
> versions ?
> > (maybe even skip as each versions embarks all the migrations from the
> > previous versions IIRC)
> >
> >
> > To be fair given it's simplicity I'd rather support the fallback 5 years
> rather than moving blobs around. Personal taste.
> >
> >
> > I was away from home and had a small car crash so I didn't have time to
> > look into 2902 yet. I had a quick look while writing this message and I
> was
> > suprised to see an API change that affects the cassandra implementation
> and
> > introduces something similar to a blobid factory (and used as such) but
> > with a different type. I left a comment to that effect and will continue
> > the review (probably tomorrow or during the weekend)
> >
> > I saw it and I will craft something around it, it seems like a relevant
> remark.
> >
> >
> > jean
> >
> > Le jeu. 15 janv. 2026 à 14:23, Benoit TELLIER <[email protected]> a
> > écrit :
> >
> >> Hi all,
> >>
> >>
> >>
> >>
> >> I would like to call a vote on the following change:
> >>
> >>   - github.com/apache/james-project/pull/2894 ADR: Deleted message
> >> vault single bucket usage
> >>
> >>   - github.com/apache/james-project/pull/2902 which is the
> >> implementation of the aforementioned ADR
> >>
> >>
> >> The use of monthly buckets is problematic with certain S3 suppliers that
> >> limit their count, and require extensive rights onto the object store
> >> endpoints.
> >> The proposal would address solely this problematic feature and not
> attempt
> >> to refactor in depth the way James' bucket are encoded onto the S3
> endpoint.
> >>
> >> (More context is provided onto the relevant ADR)
> >>
> >>
> >>
> >> This vote is open for at least 72 hours and requires a simple majority.
> >>
> >>
> >> Please vote:
> >>
> >>
> >>   - +1 approve
> >>
> >>
> >>   - 0 no opinion
> >>
> >>
> >>   - -1 disapprove (please explain)
> >>
> >>
> >>
> >> Thanks,
> >>
> >> --
> >>
> >> Best regards,
> >>
> >> Benoit TELLIER
> >>
> >> General manager of Linagora VIETNAM.
> >> Product owner for Twake-Mail product.
> >> Chairman of the Apache James project.
> >>
> >> Mail: [email protected]
> >> Tel: (0033) 6 77 26 04 58 (WhatsApp, Signal)
> >>
> >>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [VOTE] Proposal to adopt ADR-76 Deleted message vault should use a single bucket

Reply via email to