On Thu, Aug 11, 2016 at 7:33 PM, Ian Boston <[email protected]> wrote: > That probably means the queue should only > contain pointers to Documents and only index the Document as retrieved. I > dont know if that can ever work.
That would not work as what document look like across cluster node would wary and what is to be considered valid entries is also not defined at that level > Run a single thread on the master, that indexes into a co-located ES cluster. While keeping things simple that looks like the safe way > BTW, how does Hybrid manage to parallelise the indexing and maintain consistency ? Hybrid indexes does not affect async indexes. Under this each cluster node maintain there local indexes which only contain local changes [1]. These indexes are not aware about similar index on other cluster node. Further the indexes are supposed to only contain entry from last async indexing cycle. Older entries are purged [2]. The query would then be consulting both indexes (IndexSearcher backed via MultiReader , 1 reader from async index and 1 (or 2) from local index). Also note that QueryEngine would enforce and reevaluate the property restrictions. So even if index has an entry based on old state QE would filter it out if it does not match the criteria per current repository state. So aim here is to have index provide a super set of result set. In all this async index logic remains same (single threaded) and based on diff. So it would remain consistent with repository state Chetan Mehrotra [1] They might also contain entries which are determined based on external diff. Read [3] for details [2] Purge here is done my maintaining different local index copy for each async indexing cycle. At max only 2 indexes are retained and older indexes are removed. This keeps index small [3] https://issues.apache.org/jira/browse/OAK-4412?focusedCommentId=15405340&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15405340
