[ 
https://issues.apache.org/jira/browse/JAMES-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338377#comment-17338377
 ] 

Benoit Tellier edited comment on JAMES-3576 at 5/3/21, 1:49 PM:
----------------------------------------------------------------

I had been conducting preliminary *Proof Of Concept* regarding this.

The code can be seen there: 
https://github.com/chibenwa/james-project/tree/JAMES-3576 which is basicaly 
enough code to get the manager tests passing (wouhou) and a migration task (ran 
on a 1M+ load testing environments)

Difficulties encountered:

 - Copy/moves: as headerBlobId is not part of mapper/MailboxMessage API it 
could not be accessed on copy. I extended MailboxMessage with a delegate 
pattern to be able to propagate it. Mappers thus expect the copied/moved 
message to be obtained through a mapper read. This is the actual behaviour of 
the managers, but mapper test suite needs to be adapted to this pattern.
 - I discovered that once you created a column in a table in Cassandra, you can 
not change its type even after delete...
 - Encountered issues with LWT when running the migration task: (causes SERIAL 
read exceptions, causes the writes to not be applied as the rows were already 
partly there) so I bypassed them on the migration task
 - A log was lost on the migration task...

This work had been tested for the IMAP FETCH command against mailboxes 
containing 200 -> 500 messages. Fetched items include internal date - my load 
testing object storage is crap, and I did not turn on blob store cache (as I do 
not want to run the time consuming migration) so I did not include headers. 
Preliminary results shows the goal is achieved and messagev3 querying volume 
drastically reduced as shows attached resources...

 - Gatling (before & after) shows sharp decrease for fetch_all latencies. 
APPEND speed prove to be unstable accross the many runs I did and thus should 
not be taken into consideration.
 - Glowroot (before & after) explains the results: the top queried table 
requests (messagev30 are no longer present.

This work still deserve <3 - (I don't think I will be able to continue it this 
week, which is great to gather feedback)

TODO:

 - Re-adapt DAO tests to the code
 - Mapper test needs to read-before-copy
 - Also needs to fix some task tests
 - Write tests for the messagev3 fallback logic
 - Write tests for the migration
 - Do a run on a production like environment (3 Cassandra, cloud objectStorage, 
3 James nodes and 6M+ emails)


was (Author: btellier):
I had been conducting preliminary *Proof Of Concept* regarding this.

The code can be seen there: 
https://github.com/chibenwa/james-project/tree/JAMES-3576 which is basicaly 
enough code to get the manager tests passing (wouhou) and a migration task (ran 
on a 1M+ load testing environments)

Difficulties encountered:

 - Copy/moves: as headerBlobId is not part of mapper/MailboxMessage API it 
could not be accessed on copy. I extended MailboxMessage with a delegate 
pattern to be able to propagate it. Mappers thus expect the copied/moved 
message to be obtained through a mapper read. This is the actual behaviour of 
the managers, but mapper test suite needs to be adapted to this pattern.
 - I discovered that once you created a column in a table in Cassandra, you can 
not change its type even after delete...
 - Encountered issues with LWT when running the migration task: (causes SERIAL 
read exceptions, causes the writes to not be applied as the rows were already 
partly there) so I bypassed them on the migration task
 - A log was lost on the migration task...

This work had been tested for the IMAP FETCH command against mailboxes 
containing 200 -> 500 messages. Fetched items include internal date - my load 
testing object storage is crap, and I did not turn on blob store cache (as I do 
not want to run the time consuming migration) so I did not include headers. 
Preliminary results shows the goal is achieved and messagev3 querying volume 
drastically reduced as shows attached resources...

 - Gatling (before & after) shows sharp decrease for fetch_all latencies. 
APPEND speed prove to be unstable accross the many runs I did and thus should 
not be taken into consideration.
 - Glowroot (before & after) explains the results: the top queried table 
requests (messagev30 are no longer present.

This work still deserve <3 - (I don't think I will be able to continue it this 
week, which is great to gather feedback)

TODO:

 - Re-adapt DAO tests to the code
 - Mapper test needs to read-before-copy
 - Also needs to fix some task tests
 - Write tests for the messagev3 fallback logic
 - Write tests for the migration

> Further denormalize Message entity?
> -----------------------------------
>
>                 Key: JAMES-3576
>                 URL: https://issues.apache.org/jira/browse/JAMES-3576
>             Project: James Server
>          Issue Type: Improvement
>          Components: IMAPServer, JMAP
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Assignee: Antoine Duprat
>            Priority: Major
>              Labels: perf
>             Fix For: 3.7.0
>
>         Attachments: imap-reorg.png, jmap-reorg.png, poc_after_gatling.png, 
> poc_after_glowroot.png, poc_before_gatling.png, poc_before_glowroot.png
>
>
> h3. The facts
> Here is our message structure:
> {code:java}
> cqlsh:apache_james> DESCRIBE TABLE imapuidtable ;
> CREATE TABLE apache_james.imapuidtable (
>     messageid timeuuid,
>     mailboxid timeuuid,
>     uid bigint,
>     flaganswered boolean,
>     flagdeleted boolean,
>     flagdraft boolean,
>     flagflagged boolean,
>     flagrecent boolean,
>     flagseen boolean,
>     flaguser boolean,
>     modseq bigint,
>     userflags set<text>,
>     PRIMARY KEY (messageid, mailboxid, uid)
> ) WITH comment = 'Holds mailbox and flags for each message, lookup by message 
> ID';
> cqlsh:apache_james> DESCRIBE TABLE messageidtable  ;
> CREATE TABLE apache_james.messageidtable (
>     mailboxid timeuuid,
>     uid bigint,
>     flaganswered boolean,
>     flagdeleted boolean,
>     flagdraft boolean,
>     flagflagged boolean,
>     flagrecent boolean,
>     flagseen boolean,
>     flaguser boolean,
>     messageid timeuuid,
>     modseq bigint,
>     userflags set<text>,
>     PRIMARY KEY (mailboxid, uid)
> ) WITH comment = 'Holds mailbox and flags for each message, lookup by mailbox 
> ID + UID';
> cqlsh:apache_james> DESCRIBE TABLE messagev3  ;
> CREATE TABLE apache_james.messagev3 (
>     messageid timeuuid PRIMARY KEY,
>     bodycontent text,
>     bodyoctets bigint,
>     bodystartoctet int, 
>     attachments list<frozen<attachments>>,
>    // and also message properties
> ) WITH comment = 'Holds message metadata, independently of any mailboxes. 
> Content of messages is stored in `blobs` and `blobparts` tables. Optimizes 
> property storage compared to V2.';
> {code}
> Some very common patterns is to access messages headers.
>  - imap-reorg.png (attached) shows me opening my IMAP mailbox after a long 
> weekend. We can see that my MUA lists headers of the 108 messages received in 
> the time laps. We can see that, in order to retrieve the storage 
> informations, the messagev3 table needs to be accessed for each message, 
> generating a huge count of PRIMARY KEY reads that are not strictly necessary, 
> and reading messageV3 yields second place in query time occupation.
>  - Similar things happens on top of JMAP. jmap-reorg.png shows 2 webmail 
> email list loads. Same things: For each message entry, we need to query 
> messagev3 to retrieve storage informations and being able to retireve 
> headers. Here messagev3 reads yields first place, before the message metadata 
> reads, before the header reads.
> h3. The bit of Cassandra philosophy we might have missed...
> https://www.datastax.com/blog/basic-rules-cassandra-data-modeling
> {code:java}
> # Non-Goals
> ## Minimize the Number of Writes
> Writes in Cassandra aren't free, but they're awfully cheap. Cassandra is 
> optimized for high write throughput, and almost all writes are equally 
> efficient [1].
> ## Minimize Data Duplication
> Denormalization and duplication of data is a fact of life with Cassandra. 
> Don't be afraid of it. [...] In order to get the most efficient reads, you 
> often need to duplicate data.
> # Basic goals
> [...]
> ## Rule 2: Minimize the Number of Partitions Read
> [...] Furthermore, even on a single node, it's more expensive to read from 
> multiple partitions than from a single one due to the way rows are stored.
> {code}
> https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
> {code:java}
>  An incorrect data model can turn a single query into hundreds of queries, 
> resulting in increased latency, decreased throughput, and missed SLAs.
> {code}
> (This one is of an article about compaction but my feeling is that it is very 
> relevant to the situation I describe, so I could not refrain from quoting 
> it...)
> h3. The new data-model
> I propose to do the following:
> {code:java}
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD internalDate timestamp ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD bodyStartOctet bigint  ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD fullContentOctets bigint  ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD headerContent text  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD internalDate timestamp ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD bodyStartOctet bigint  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD fullContentOctets bigint  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD headerContent text  ;
> {code}
> That way we can easily  resolve METADATA and HEADERS FetchGroups against both 
> messageIdTable and imapUidTable, effectively limiting messageV3 reads to the 
> FULL body reads.
> h3. Expectations
> This will effectively reduce the Cassandra query load for both JMAP and IMAP, 
> effectively speeding up James and allowing us to scale to larger workloads 
> given the exact same infrastructure. A boost ranging from 25% to 33% is 
> expected for both IMAP, JMAP and POP3 workloads.
> h3. Migration strategy
>  - 1. The admin ALTER the tables
>  - 2. The admin deploys the new version of James. New written data is then 
> fully denormalized...
>  - 3. But old written data still needs reads to messagev3 to be served (if 
> expected data is not in messageIdTable or in imapUidTable we know we need to 
> read it from messagev3 table).
>  - 4. We propose a migration task that effectively look up messagev3 to 
> populate newly created rows for messageIdTable and imapUidTable - this way an 
> admin can ensure to fully benefit from the enhancement given previously 
> existing data.
> I think the classical migration strategy is not a good fit for this one as:
>  - fallback mechanisms incurs performance degradations (double the amount of 
> reads in the transition period) and message metadata query speed is critical. 
> With the proposed strategy during the transition period at worst the previous 
> behavior is applied.
>  - Creating and deleting tables is messy, when simple in-place modification 
> do not generate data model gardbage.
>  - We can add a startup-check to ensure the rows are correctly here (and 
> abort startup if not)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to