Hello, I'm taking my evening to try and answer about this asking my questions. Especially since I saw the PR is already being pushed forward and I am not convinced by the design decisions that were made.
> Today James does not support multi-tenancy for blob store. Therefore blob > isolation between domains could be an issue for example in a SaaS > deployment that requires strict data isolation for users. Would you say that James supports multi-tenancy in other parts, why focus on blob store ? James can handle multiple domains as long as you don't need to do TLS/SSL or DKIM properly. Is a domain the same thing as a tenant ? If yes why introduce a new concept, if no what would you say is the difference/relationship between a domain and a tenant ? how do you expect the new concept to reflect in the various apis ? Pulsar has a multi tenant api that scopes all operations for instance. it is clearly a first class concept. Should tenants also be a first class concept in james ? You propose to introduce another concept: `Bucket ` which is composed of a `BucketName` and an optional `Tenant`. How does the Tenant relate to the `BucketName` ? shouldn't the tenant or domain be a first class parameter of the storage apis instead ? The jira mentions >That way blobstore could implement different isolation strategies for tenants (configurable): > - buckets as today - good for few tenants after all.\ Which suggests that one can already use the buckets to isolate tenants/domains, this in turn suggests that the BucketName passed in the BlobStore api today is already usable as a discriminant for tenants/domains (though I have yet to find any uses of this with something other than the default bucketname in the code). If BucketName is the current parameter to isolate tenants shouldn't it be replaced by the new tenant concept instead of adding another information ? How should a programmer decide when to use which ? Do we really need to propagate the domain/tenant concept to all the storage apis ? Will it be necessary for the MailRepositories too for instance ? Overall, Isn't this change to introduce multi-tenancy large enough to warrant an ADR if only to answer these questions and document the concepts (possibly retrodocument them) Jean Le lun. 4 nov. 2024 à 04:45, Quan tran hong <quan.tranhong1...@gmail.com> a écrit : > Hi everyone, > > Today James does not support multi-tenancy for blob store. Therefore blob > isolation between domains could be an issue for example in a SaaS > deployment that requires strict data isolation for users. > > We (Linagora) think it would be good to implement multiple tenancies for > the blob store and would like to contribute it. We would like to propose > the idea to implement the need and love to hear the community's thoughts. > > Firstly, we propose to refactor a bit the `BlobStore` API to accept the > tenant information. > > We introduce some classes to contain the tenant information: > > ``` > > public record Tenant(String name) { > public static Tenant from(Domain domain) { > return new Tenant(domain.asString()); > } > public String asString() { > return name; > } > } > > > public record Bucket(BucketName bucketName, Optional<Tenant> tenant) { > public static Bucket of(BucketName bucketName) { > return new Bucket(bucketName, Optional.empty()); > } > } > > ``` > > We refactor the `BlobStore` APIs to accept the tenant input e.g. > `InputStream read(BucketName bucketName, BlobId blobId);` to > `InputStream read(Bucket bucket, BlobId blobId);`. > > Then each implementation (S3, File, Postgres...) can choose if it > implement the multi-tenancy. > > Hereby we propose some options to implement multi-tenancy for S3. > > ## S3*### Configuration* > ``` > multi-tenancy.mode=none|bucket|ssec|prefix > > ``` > > Default to no multi-tenancy behavior, as of today. > > *### S3 multi-tenancy options* > *#### bucket* > > Each tenant uses one dedicated S3 bucket. > > Notes: GC is likely broken and shall be tested with this mode. > > *#### prefix* > > Each tenant uses one dedicated prefix while sharing a same bucket. > > Notes: We shall make sure the GC, when listing only takes the last > part of the s3Key IE given `prefix/ABC` the GC only uses ABC as a > blobId. > > *#### ssec (server side encryption - client)* > > Each tenant would use a derivated encryption SSE-C key to encrypt/decrypt > data. That way tenant A won't have the tenant B's key to negatively impact > tenant B data. > > Notes: This implementation should fail with deduplicating blobStore. > > That should be very core idea about the S3 multi-tenancy implementation. We > plan to implement blob store multi-tenancy for other implementations e.g. > File, PostgreSQL and Memory too. For more details on those implementations > please have a look at the Jira ticket cf > https://issues.apache.org/jira/projects/JAMES/issues/JAMES-4085. > > cc @Linagora colleagues please add on me if I am wrong or missing anything. > > We would love to hear the community feeback on this. Is this topic > interesting to you? What else implementation you have in mind? Please let > us know :-). > > Thank for reading. > > Regards, > > Quan >