On Thu, Aug 11, 2016 at 7:33 PM, Ian Boston <[email protected]> wrote:
> That probably means the queue should only
> contain pointers to Documents and only index the Document as retrieved. I
> dont know if that can ever work.

That would not work as what document look like across cluster node
would wary and what is to be considered valid entries is also not
defined at that level

> Run a single thread on the master, that indexes into a co-located ES
cluster.

While keeping things simple that looks like the safe way

> BTW, how does Hybrid manage to parallelise the indexing and maintain
consistency ?

Hybrid indexes does not affect async indexes. Under this each cluster
node maintain there local indexes which only contain local changes
[1]. These indexes are not aware about similar index on other cluster
node. Further the indexes are supposed to only contain entry from last
async indexing cycle. Older entries are purged [2]. The query would
then be consulting both indexes (IndexSearcher backed via MultiReader
, 1 reader from async index and 1 (or 2) from local index).

Also note that QueryEngine would enforce and reevaluate the property
restrictions. So even if index has an entry based on old state QE
would filter it out if it does not match the criteria per current
repository state. So aim here is to have index provide a super set of
result set.

In all this async index logic remains same (single threaded) and based
on diff. So it would remain consistent with repository state

Chetan Mehrotra
[1] They might also contain entries which are determined based on
external diff. Read [3] for details
[2] Purge here is done my maintaining different local index copy for
each async indexing cycle. At max only 2 indexes are retained and
older indexes are removed. This keeps index small
[3] 
https://issues.apache.org/jira/browse/OAK-4412?focusedCommentId=15405340&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15405340

Reply via email to