[ 
https://issues.apache.org/jira/browse/JAMES-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714014#comment-17714014
 ] 

Benoit Tellier commented on JAMES-3777:
---------------------------------------

Edito of 19/04...

 => Increments in place with massive gains CF 
https://github.com/apache/james-project/pull/1530#issuecomment-1514138889

Before after 14 minutes I failed at creating 1000 rules.

After this is conducted out in 1 minute.

 => Snapshot

By saving the snapshot as an event (reset of the rules) and adding a static 
column for tracking the lastest snapshot and skipping the history before.

This further enhances the performance of filtering: 5000 rules created in 
6min18 instead of 31min42 (last event creation took 124ms instead of 954ms)

I will propose snapshot in another pull request.

> Event sourcing - O[n²] storage for filters
> ------------------------------------------
>
>                 Key: JAMES-3777
>                 URL: https://issues.apache.org/jira/browse/JAMES-3777
>             Project: James Server
>          Issue Type: Improvement
>    Affects Versions: 3.7.0
>            Reporter: Benoit Tellier
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> h2. Symptoms
> ```  
> Largest Partitions:     
> [FilteringRule/x...@linagora.com] 44952069 (45.0 MB)
> ```
> Every time this guy sends an email we load 45 MB of JSON, which can yield  
> big performance impact.
> h2. What?
> We implemented event sourcing with reset. Given rule A, B if we want to 
> persist rule C then we store a "reset to A, B, C" event.
> So, if we want to store N filter, the resulting structure with have a size 
> depending of O[n²] which proves to be barely sustainable.
> h2. How to fix
> Coming back to O[n] likely would help.
> Implement filter addition / removal both at the storage and JMAP layer
> h2.  Alternatives
> h3. The read projection
> Currently we are loading the full history, building the aggregate each time 
> we process emails, and performing SERIAL lightweight transactions. Which is 
> very common. And impactfull.
> It would be possible to introduce  read projection, maintained by a 
> subscriber to the event source, that would allow efficiently reading current 
> filters for a given user.
> This mean the history would be loaded only upon writes, which are rare.
> Impact: yet another table. Also the solution is local to this usage and does 
> not help other event sourcing usages.
> h3. Event sourcing snapshots
> Augment James event sourcing implementation with a Snapshot mechanism.
> Upon reading history, we would start reading available snapshots, then read 
> the history from that snapshot.
> Event store would be responsible of taking snapshots. Even a one change out 
> of 10 would do the job here.
> This implies being able to serialize state. This implies an additional table 
> for storing event sourcing snapshots.
> My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation 
> that we don't need to grow the complexity of the event sourcing code.
> On the other hand, this ewould harden event sourcing code and likely lift 
> most of the limitation for adoptions on the mailboxes write path (to enforce 
> mailbox name unicity constraint).
> Note that both solutions are not exclusive.
> h3. The dirty fix
> For filters the history prior reset event can be dropped, this can be used to 
> solve the immediate problem, even if it is not very clean.
> h1. Proposal
>  - Implement a read projection
>  - Implement addition / removal patches to filtering event sourcing aggregate
>  - Don't implement event sourcing snapshots now
> And also... Remove the obligation to configure JMAP filtering mailet inside 
> JMAP servers: after all this extension is not standard...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to