[ https://issues.apache.org/jira/browse/JAMES-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714014#comment-17714014 ]
Benoit Tellier commented on JAMES-3777: --------------------------------------- Edito of 19/04... => Increments in place with massive gains CF https://github.com/apache/james-project/pull/1530#issuecomment-1514138889 Before after 14 minutes I failed at creating 1000 rules. After this is conducted out in 1 minute. => Snapshot By saving the snapshot as an event (reset of the rules) and adding a static column for tracking the lastest snapshot and skipping the history before. This further enhances the performance of filtering: 5000 rules created in 6min18 instead of 31min42 (last event creation took 124ms instead of 954ms) I will propose snapshot in another pull request. > Event sourcing - O[n²] storage for filters > ------------------------------------------ > > Key: JAMES-3777 > URL: https://issues.apache.org/jira/browse/JAMES-3777 > Project: James Server > Issue Type: Improvement > Affects Versions: 3.7.0 > Reporter: Benoit Tellier > Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > h2. Symptoms > ``` > Largest Partitions: > [FilteringRule/x...@linagora.com] 44952069 (45.0 MB) > ``` > Every time this guy sends an email we load 45 MB of JSON, which can yield > big performance impact. > h2. What? > We implemented event sourcing with reset. Given rule A, B if we want to > persist rule C then we store a "reset to A, B, C" event. > So, if we want to store N filter, the resulting structure with have a size > depending of O[n²] which proves to be barely sustainable. > h2. How to fix > Coming back to O[n] likely would help. > Implement filter addition / removal both at the storage and JMAP layer > h2. Alternatives > h3. The read projection > Currently we are loading the full history, building the aggregate each time > we process emails, and performing SERIAL lightweight transactions. Which is > very common. And impactfull. > It would be possible to introduce read projection, maintained by a > subscriber to the event source, that would allow efficiently reading current > filters for a given user. > This mean the history would be loaded only upon writes, which are rare. > Impact: yet another table. Also the solution is local to this usage and does > not help other event sourcing usages. > h3. Event sourcing snapshots > Augment James event sourcing implementation with a Snapshot mechanism. > Upon reading history, we would start reading available snapshots, then read > the history from that snapshot. > Event store would be responsible of taking snapshots. Even a one change out > of 10 would do the job here. > This implies being able to serialize state. This implies an additional table > for storing event sourcing snapshots. > My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation > that we don't need to grow the complexity of the event sourcing code. > On the other hand, this ewould harden event sourcing code and likely lift > most of the limitation for adoptions on the mailboxes write path (to enforce > mailbox name unicity constraint). > Note that both solutions are not exclusive. > h3. The dirty fix > For filters the history prior reset event can be dropped, this can be used to > solve the immediate problem, even if it is not very clean. > h1. Proposal > - Implement a read projection > - Implement addition / removal patches to filtering event sourcing aggregate > - Don't implement event sourcing snapshots now > And also... Remove the obligation to configure JMAP filtering mailet inside > JMAP servers: after all this extension is not standard... -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org