[ 
https://issues.apache.org/jira/browse/JAMES-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315415#comment-17315415
 ] 

Benoit Tellier commented on JAMES-3544:
---------------------------------------

> There is an API proposal for blob deletion: 
> https://tools.ietf.org/html/draft-gondwana-jmap-blob-01

I am watching this draft, when a bit more formal it is definitely nice to have.

However I do believe it is not enough. Pushing such privacy concern to a 
concerns seems a weak way to enforce it, hence I did not mention it here.

> JMAP uploaded blobs are never deleted
> -------------------------------------
>
>                 Key: JAMES-3544
>                 URL: https://issues.apache.org/jira/browse/JAMES-3544
>             Project: James Server
>          Issue Type: Sub-task
>          Components: Blob, JMAP
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Assignee: Antoine Duprat
>            Priority: Major
>
> This is a concern both to privacy and cost control (as one need to pay for 
> storage).
> JMAP deploys no method to delete uploaded blobs (maybe I could propose 
> something on the IETF)
> https://jmap.io/spec-core.html#uploading-binary-data suggest that the server 
> might decide to delete the data.
> {code:java}
> Under rare circumstances, the server may have deleted the blob before the 
> client uses it; 
> the client should keep a reference to the local file so it can upload it 
> again in such a situation.
> {code}
> *Root cause of the issue*
> We rely on the AttachmentManager for uploads - which is inherited from JMAP 
> draft.
> Attachment manager uses the following fallback right mechanism:
>  - First see if the user accessing content is holding a message referencing 
> that attachment
>  - If not, second, check if he did upload that attachment.
> AttachmentManager holds some data referenced by user messages, thus automatic 
> deletion without a clear separation of concepts looks scary...
> *How:* 
> We should deprecate the following AttachmentMapper methods (and underlying 
> storage code) - and simplify AttachmentManager code accordingly:
> {code:java}
> public interface AttachmentMapper extends Mapper {
>     // to be deprecated
>     Publisher<AttachmentMetadata> storeAttachmentForOwner(ContentType 
> contentType, InputStream attachmentContent, Username owner);
>     Collection<Username> getOwners(AttachmentId attachmentId) throws 
> MailboxException;
> }
> {code}
> We should write an UploadedContentRepository, holding only the content, the 
> content-type, the owner and the size of the data. Upload date can be useful 
> too even if not requested by JMAP APIs. Backed by the BlobStore (and thus 
> ObjectStorage), we will need also a metadata system on top of it (Cassandra).
> Data expiracy would be achieved via bucket deletion: all data uploaded in a 
> month are held in a bucket, and at month+2 the bucket can be dropped - in 
> order to ensure no data younger than a month is deleted. We can likely accept 
> dandling metadata as no critical data is help there (user, size, content 
> type). If needed a scroll could come and cleanup expired metadata, but it 
> might be expensive to run.
> A webAdmin endpoint would trigger the cleanup and rely on an external  
> scheduler to trigger the cleanup.
> We follow a similar design on the DeletedMessageVault 
> (https://issues.apache.org/jira/browse/JAMES-2811)
> I bet my team could be working on this topic, but we do not have a plan on 
> this just yet.
> *Impact*
>  - Blob uploaded before this proposed changed will be accessible via the use 
> of the AttachmentManager uploader right path (before its deletion), 
> inaccessible after
>  - Cleanup of blobContent uploaded before this change gets applied is a non 
> goal of my proposal. A separate batch could be use, reading cassandra data, 
> and deleting uploaded blobs. A task could maybe even be exposed for such 
> needs... 
> *Definition of done*
> Demostrate data expiracy in an integration test, paying with a mocked clock 
> injected via guice.
> Documentation needs to be written so that admins do not forget to schedule 
> the cleanup task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to