Hi Chetan, Question about the indexing step: From your description I am not sure how the indexing would be triggered for local changes. Probably not through the Async Indexer (this would not gain us much, right?). Would this be a Commit Hook?
Michael On 23/07/15 13:48, "Chetan Mehrotra" <[email protected]> wrote: >Hi Team, > >As the use of async index like lucene is growing we would need to >account for delay in showing updated result due to async nature of >indexing. Depending on system load the asyn indexer might lag behind >the latest state by some margin. We have improved quite a bit in terms >of performance but by design there would be a lag and with load that >lag would increase at times. > >For e.g. a typical flow in content authoring involves the user >uploading some asset to application. And after uploading the asset he >goes to the authoring view and look for that uploaded asset via >content finder kind of ui. That ui relies on query to show the >available assets. Due to delay introduced by async indexer it would >take some time (10-15 sec) > >To account for that we can go for a near real time (NRT*) in memory >indexing which would complement the actual persisted async indexer and >would exploit the fact the request from same user in a give session >would most likely hit same cluster node. > >Below is brief proposal - This would require changes in layer above in >Oak but for now focus is on feasibility. > >Proposal >======= > >A - Indexing Side >------------------ > >The Lucene index can be configured to support NRT mode. If this mode >is enabled then on each cluster node we would perform AsyncIndex only >for local changes. For such indexer LuceneIndexEditor would use a >RAMDirectory. This directory would only have *recently* modified/added >documents. > >B - Query Side >--------------- > >On Query side the LucenePropertyIndex would perform search against two >IndexSearcher > >1. IndexSearcher based on persisted OakDirectory >2. IndexSearcher obtained from the current active IndexWrite used with >RAMDirectory > >Query would be performed against both and a merged cursor [2] would be >returned back > >C - Benefits >---------------- > >This approach would allow the user to at least see his modifications >appear quickly in search results and would make the search results >accuracy more deterministic. > >This feature need not be enabled globally but can be enabled on per >index basis. Based on business requirement > >D- Challenges >------------------- >1. Ensuring that RAMDirectory is bounded and only contain recently >modified documents. The lower limit can be based on last indexed time >from AsyncIndexer. Periodically we would need to prune old documents >from this RAMDirectory > >2. IndexUpdate would need to be adapted to support this hybrid model >for same index type - So something to be looked into > >Thoughts? > >Chetan Mehrotra > >NRT - Near real Time is technically a Lucene term >https://wiki.apache.org/lucene-java/NearRealtimeSearch. However using >here as approach is bit similar! > >[2] Such a merged cursor and performing query against multiple >searcher would anyway be required to support zero downtime kind of >requirement where index content would be split across local and global >instance
