Thanks for the reply!

Regarding byte Array usage I cannot more than share your views.

Work would be needed in my opinion to have the blobStore used in place
of `Message` byte fields.

However, as much as I do care about it, memory usage is not my concern
of the day.

My concern of the day is the fact **To read fast JMAP property I need
slow full message loading**, and this is what my proposition addresses.

I encourage you to split your **James usage of byte arrays** discussion
in a separated thread (let's discuss this! but let's honor this with
it's standalone discussions :-))

On 28/11/2019 17:26, Matthieu Baechler wrote:
> Hi,
> 
> On Thu, 2019-11-28 at 15:51 +0700, Tellier Benoit wrote:
>> Hello all,
>>
>> ## Context
>>
>> We are working on JMAP, and EMail::hasAttachments metadata is listed
>> as
>> a fast property.
>>
>> [...]
> 
> I'm glad you trigger this discussion because it's a very important one
> if we want to have a fast server in the future.
> 
> I will share my analysis regarding James usage of byte arrays.
> 
> [...]
> 
> And it leads me to the actual topic of this email: of course we need to
> compute as many things as possible to prevent loading the raw data.

Let's not confuse things.

I'm not speaking of computing anything here.

I'm just speaking of better structuring already available data.

> 
>> ## Proposal
>>
>> Introduce a new POJO: MessageAttachmentMetadata (mailbox-api)
>>  - id
>>  - name
>>  - cid
>>  - isInline
>>  - size
>>  - type
>>
>>  - Message (mailbox-store) & MessageResult (mailbox-api) SHOULD
>> return
>> MessageAttachmentMetadata NOT MessageAttachment. Thus these metadata
>> will be added at the FetchGroup.profile.MINIMAL.
> 
> Do we always need attachments information?

Of course not.

We can think about a separate FetchGroup / FetchType read level if that
is an issue.

I want to bring your attention to our current usage:
 - Cassandra is always reading these metadata, which can be cheaply coputed.
 - Memory has it for free
 - Over implementations do lazy load this from the full content (and
parse it) - maildir + jpa.

Which mean that the impact of this decision is limited.

> 
>> We need to port "scanning search" to do on the fly message parsing. 
> 
> Not sure to understand this sentence

Scanning search iterates messages. Upon **attachment** criterion, read
the message full content and parse it to retrieve attachment list (that
might not be stored with some implementation).

Did I explained it better?

> 
>> This
>> is OK as:
>>  - memory-guice is not intending for production usage, no need to be
>> performant
>>  - Usage of scanning search is not exposed as a product
>>  - jpa and maildir do not store attachment so recomputation is the
>> current behaviour.
>>
>> ## Consequences
>>
>> JMAP Email::hasAttachment property would rely on
>> FetchGroup.Profile.MINIMAL, allowing the implementation of
>> https://github.com/apache/james-project/blob/master/src/adr/0013-precompute-jmap-preview.md
>>
>>
>> JMAP GetMessages with full profile will read 2 time less data
>> allowing
>> both a cost and performance improvment.
>>
>> Note that all caller reading full message will also benefit from
>> these
>> changes (IMAP fetch, mailbox backup, review recomputation)
>>
>> ## Alternative
>>
>> We could merge MessageAttachment & Attachment together however this
>> would lead to significant datastructure re-arranging for no
>> behavioural
>> gains and just a slightly more lean API.
>>
>> Thus I propose not to takle this now.
> 
> I'm not sure to understand this alternative.

That's not important here. I'm saying there is a 1 to 1 replationship
between Attachment and MessageAttachment POJOs (as I recall it this is a
vestige of content deduplication at the attachment level).

I see that we can get rid of Attachment POJO, but that would require
(cassandra) storage re-structuration. For API gains.

> 
> What about this one: remove bytes from Attachment and always call
> BlobStore when you need to read the raw data (or put it in a temp file
> for inbound emails)

The issue with this is that BlobStore is an implementation details of
mailbox-* and is not part of the API.

Sure, that's achievable with some indirection:

```
public interface AttachmentMapper extends Mapper {
   //...

   void storeAttachmentForOwner(Attachment attachment, Username owner
byte[] payload) throws MailboxException;

   byte[] loadContent(AttachmentId attachmentId)

   //..
}
```

Doable too.

To preserve attachment search capabilities for back-end not supporting
attachment storage, additional levels of indirection might be required
(and could be tricky).

> 
> Looking at the code, it should not be that hard to implement and we
> could monitor BlobStore usage by implementing something in glowroot.
> That would be a first step toward a no-byte-array strategy.
> 
> [...]
> 
> Cheers,
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to