Re: Oak Indexing. Was Re: Property index replacement / evolution

Torgeir Veimo Thu, 11 Aug 2016 03:53:02 -0700

ES 2.3.5 is currently on lucene 5.5, while oak-lucene is at 4.7.1. Maybe
this would inspire upgrading oak-lucene as well to avoid having multiple
different bundeled lucene versions?


https://issues.apache.org/jira/browse/OAK-3150

On 11 August 2016 at 20:43, Chetan Mehrotra <[email protected]>
wrote:

> > https://github.com/ieb/oak-es
>
> btw this looks interesting and something we can build upon. This can
> benefit from a refactoring of LuceneIndexEditor to separate the logic
> of interpreting the Oak indexing config during editor invocation from
> constructing Lucene document. If we decouple that logic then it would
> be possible to plugin in a ES Editor which just converts those
> properties per ES requirement. Hence it gets all benefits of
> aggregation, relative property implementation etc (which is very Oak
> specific stuff). This effort has been discussed but we never got time
> to do that so far. Something on the lines which you are doing at [2]
>
> Another approach - With recent refactoring done in  OAK-4566 my plan
> was to plugin a ES based LuceneIndexWriter (ignore the name for now!)
> and convert the Lucene Document to some ES Document counterpart. And
> then provide just the query implementation. This would also allow to
> reuse most of testcase we have in oak-lucene
>
> Chetan Mehrotra
> [2] https://github.com/ieb/oak-es/blob/master/src/main/java/org/
> apache/jackrabbit/oak/plusing/index/es/index/take2/
> ESIndexEditorContext.java
>
> On Thu, Aug 11, 2016 at 3:40 PM, Chetan Mehrotra
> <[email protected]> wrote:
> > On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <[email protected]> wrote:
> >> Both Solr Cloud and ES address this by sharding and
> >> replicating the indexes, so that all commits are soft, instant and real
> >> time. That introduces problems.
> > ...
> >> Both Solr Cloud and ES address this by sharding and
> >> replicating the indexes, so that all commits are soft, instant and real
> >> time.
> >
> > This would really be useful. However I have couple of aspects to clear
> >
> > Index Update Gurantee
> > --------------------------------
> >
> > Lets say if commit succeeds and then we update the index and index
> > update fails for some reason. Then would that update be missed or
> > there can be some mechanism to recover. I am not very sure about WAL
> > here that may be the answer here but still confirming.
> >
> > In Oak with the way async index update works based on checkpoint its
> > ensured that index would "eventually" contain the right data and no
> > update would be lost. if there is a failure in index update then that
> > would fail and next cycle would start again from same base state
> >
> > Order of index update
> > -----------------------------
> >
> > Lets say I have 2 cluster nodes where same node is being performed
> >
> > Original state /a {x:1}
> >
> > Cluster Node N1 - /a {x:1, y:2}
> > Cluster Node N2 - /a {x:1, z:3}
> >
> > End State /a {x:1, y:2, z:3}
> >
> > At Oak level both the commits would succeed as there is no conflict.
> > However N1 and N2 would not be seeing each other updates immediately
> > and that would depend on background read. So in this case how would
> > index update would look like.
> >
> > 1. Would index update for specific paths go to some master which would
> > order the update
> > 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3}
> >
> > Here current async index update logic ensures that it sees the
> > eventually expected order of changes and hence would be consistent
> > with repository state.
> >
> > Backup and Restore
> > ---------------------------
> >
> > Would the backup now involve backup of ES index files from each
> > cluster node. Or assuming full replication it would involve backup of
> > files from any one of the nodes. Would the back be in sync with last
> > changes done in repository (assuming sudden shutdown where changes got
> > committed to repository but not yet to any index)
> >
> > Here current approach of storing index files as part of MVCC storage
> > ensures that index state is consistent to some "checkpointed" state in
> > repository. And post restart it would eventually catch up with the
> > current repository state and hence would not require complete rebuild
> > of index in case of unclean shutdowns
> >
> >
> > Chetan Mehrotra
>



-- 
-Tor

Re: Oak Indexing. Was Re: Property index replacement / evolution

Reply via email to