[ 
https://issues.apache.org/jira/browse/JAMES-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266993#comment-17266993
 ] 

René Cordier commented on JAMES-3202:
-------------------------------------

https://github.com/linagora/james-project/pull/4191 allows a full reindexing 
without data cleanup

> ReIndexing "filtering" for only outdated indexed data
> -----------------------------------------------------
>
>                 Key: JAMES-3202
>                 URL: https://issues.apache.org/jira/browse/JAMES-3202
>             Project: James Server
>          Issue Type: Improvement
>            Reporter: René Cordier
>            Priority: Major
>             Fix For: 3.6.0
>
>
> *Why?*
> ReIndexing can be slow, and requires to read all messages in the DB, then 
> trigger the full reIndexing, even when the document is not outdated.
> All these document changes creates a lot of deleted documents. Lucene "marks 
> them as deleted", polluting the entire index until segment merging happens 
> (yet another costly operation). The less we do updates the better. To be 
> noted that partial updates still leads to a full new document in Lucene, and 
> just optimises bandwith + avoids reads.
> *Need specification*
> As an admin, I want to run a reIndex.
> We furtermore handle `RunningOptions` allowing to specify the message rate 
> attempted. See [https://github.com/linagora/james-project/pull/3394]
> We still need, given a message, get it's search index representation (at 
> least for its mutable data). From this we will be able to condition the 
> reindexing to outdated/non exsting data, significantly fasting up the 
> reindexing process on mostly valid indexes. The admin could then mention via 
> query parameter this option (carried over in running options).
> *MessageSearchIndex API changes*:
> {code:java}
> inderface MessageSearchIndex {
>    //...
>    Mono<Flags> retrieveIndexedFlags(MailboxId mailboxId, MessageUid uid);
>    //...
> }
> {code}
> ElasticSearch will rely on the _GET_ verb (not search).
> Unit test will be written for this new method.
> ReIndexing `RunningOptions` will then carry over the option, that 
> ReIndexerPerformer will need to take into account.
> Sample webadmin API:
> {code:bash}
> curl -XPOST http://james:8000/mailboxes?action=reindex&filter=outdatedIndex
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to