Benoit Tellier created JAMES-4198:
-------------------------------------

             Summary: Optimize message write path and DB footprint
                 Key: JAMES-4198
                 URL: https://issues.apache.org/jira/browse/JAMES-4198
             Project: James Server
          Issue Type: Improvement
          Components: cassandra, IMAPServer, jpa, mailbox
            Reporter: Benoit Tellier


h3. Why ? 

 - We needlessly store information that is either in the header or easy to 
opbtain in Messagev3 properties. This is taking space on Cassandra...

Tables (not tiered, with per message entries) for 66 million emails (3 nodes 
RF=3) :
 - `messagev3` table 17 GB
 - `imapuidtable` table 10GB
 - `messageidtable` table 7 GB
 - `email_query_view_received_at` table 2 GB
 - `firstunseen` table 287 MB
 - `thread_lookup_3` 6GB

We see a footprint of ~2-3KB (replicated, tiered) per message.  

We can expect a 33% reduction of messagev3 size by removing the content 
description and properties field. Translating to a 10-13% space saving. At 
scale for 10 billion messages this means 20TB -> 18TB... Sad for something that 
is useful only for IMAP FETCH BODYSTRUCTURE and could be easily recomputed.

 - We count line with unoptimized input stream for each message with content 
type `text/*` reading byte per byte (PERF KILLER!) while it is useful only upon 
IMAP FETCH BODYSTRUCTURE - we'd rather move it at read time.

 - At last MessageStorer calls parsing for each and every message. We could 
easily cary other (after removing PropertyBuilder) the content type and trigger 
this expensive parsing IF and only IF content type is `multipart/*` or 
`content-disposition` is `attachment` in main headers, saving CPU on the write 
path.

h3. How ?

Remove propertyBuider from Message POJOs.

IMAP FETCH BODYSTRUCTURE operates on full content: we can easily recompute this 
in MessageResult POJO when (and only when) needed.

Take care to still carry other contentType and ContentDescription for the 
unrelated but connex and interesting MessageStorer optimization.

h3. Expected gains

Significant CPU gains for `text/*` message APPEND / reception

~ 10% data reduction on Cassandra



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to