Hi Chetan,

Question about the indexing step:
From your description I am not sure how the indexing would be triggered for 
local changes. Probably not through the Async Indexer (this would not gain us 
much, right?). Would this be a Commit Hook?

Michael




On 23/07/15 13:48, "Chetan Mehrotra" <[email protected]> wrote:

>Hi Team,
>
>As the use of async index like lucene is growing we would need to
>account for delay in showing updated result due to async nature of
>indexing. Depending on system load the asyn indexer might lag behind
>the latest state by some margin. We have improved quite a bit in terms
>of performance but by design there would be a lag and with load that
>lag would increase at times.
>
>For e.g. a typical flow in content authoring involves the user
>uploading some asset to application. And after uploading the asset he
>goes to the authoring view and look for that uploaded asset via
>content finder kind of ui. That ui relies on query to show the
>available assets. Due to delay introduced by async indexer it would
>take some time (10-15 sec)
>
>To account for that we can go for a near real time (NRT*) in memory
>indexing which would complement the actual persisted async indexer and
>would exploit the fact the request from same user in a give session
>would most likely hit same cluster node.
>
>Below is brief proposal - This would require changes in layer above in
>Oak but for now focus is on feasibility.
>
>Proposal
>=======
>
>A - Indexing Side
>------------------
>
>The Lucene index can be configured to support NRT mode. If this mode
>is enabled then on each cluster node we would perform AsyncIndex only
>for local changes. For such indexer LuceneIndexEditor would use a
>RAMDirectory. This directory would only have *recently* modified/added
>documents.
>
>B - Query Side
>---------------
>
>On Query side the LucenePropertyIndex would perform search against two
>IndexSearcher
>
>1. IndexSearcher based on persisted OakDirectory
>2. IndexSearcher obtained from the current active IndexWrite used with
>RAMDirectory
>
>Query would be performed against both and a merged cursor [2] would be
>returned back
>
>C - Benefits
>----------------
>
>This approach would allow the user to at least see his modifications
>appear quickly in search results and would make the search results
>accuracy more deterministic.
>
>This feature need not be enabled globally but can be enabled on per
>index basis. Based on business requirement
>
>D- Challenges
>-------------------
>1. Ensuring that RAMDirectory is bounded and only contain recently
>modified documents. The lower limit can be based on last indexed time
>from AsyncIndexer. Periodically we would need to prune old documents
>from this RAMDirectory
>
>2. IndexUpdate would need to be adapted to support this hybrid model
>for same index type - So something to be looked into
>
>Thoughts?
>
>Chetan Mehrotra
>
>NRT - Near real Time is technically a Lucene term
>https://wiki.apache.org/lucene-java/NearRealtimeSearch. However using
>here as approach is bit similar!
>
>[2] Such a merged cursor and performing query against multiple
>searcher would anyway be required to support zero downtime kind of
>requirement where index content would be split across local and global
>instance

Reply via email to