Hi, On 11 August 2016 at 13:03, Chetan Mehrotra <[email protected]> wrote:
> On Thu, Aug 11, 2016 at 5:19 PM, Ian Boston <[email protected]> wrote: > > correct. > > Documents are shared by ID so all updates hit the same shard. > > That may result in network traffic if the shard is not local. > > Focusing on ordering part as that is the most critical aspect compared > to other. (BAckup and Restore with sharded index is a separate problem > to discuss but later) > > So even if there is a single master for a given path how would it > order the changes. Given local changes only give partial view of end > state. > In theory, the index should be driven by the eventual consistency of the source repository, eventually reaching the same consistent state, and updating on each state change. That probably means the queue should only contain pointers to Documents and only index the Document as retrieved. I dont know if that can ever work. > > Also in such a setup would each query need to consider multiple shards > for final result or each node would "eventually" sync index changes > from other nodes (complete replication) and query would only use local > index > > For me ensuring consistency in how index updates are sent to ES wrt > Oak view of changes was kind of blocking feature to enable > parallelization of indexing process. It needs to be ensured that for > concurrent commit end result in index is in sync with repository > state. > agreed, me also on various attempts. > > Current single thread async index update avoid all such race condition. > Perhaps this is the "root" of the problem. The only way to index Oak consistently is with a single thread globally, as is done now That's still possible with ES. Run a single thread on the master, that indexes into a co-located ES cluster. If the full text extraction is distributed, then master only needs to resource writing the local shard. Its not as good as parallelising the queue, but given the structure of Oak might be the only way. Even so, future revisions will be in the index long before Oak has synced the root document. The current implementation doesn't have to think about this as the indexing is single threaded globally *and* each segment update committed first by a hard lucene commit and second by a root document sync guaranteeing the sequential update nature. BTW, how does Hybrid manage to parallelise the indexing and maintain consistency ? Best Regards Ian > > Chetan Mehrotra >
