ES 2.3.5 is currently on lucene 5.5, while oak-lucene is at 4.7.1. Maybe this would inspire upgrading oak-lucene as well to avoid having multiple different bundeled lucene versions?
https://issues.apache.org/jira/browse/OAK-3150 On 11 August 2016 at 20:43, Chetan Mehrotra <[email protected]> wrote: > > https://github.com/ieb/oak-es > > btw this looks interesting and something we can build upon. This can > benefit from a refactoring of LuceneIndexEditor to separate the logic > of interpreting the Oak indexing config during editor invocation from > constructing Lucene document. If we decouple that logic then it would > be possible to plugin in a ES Editor which just converts those > properties per ES requirement. Hence it gets all benefits of > aggregation, relative property implementation etc (which is very Oak > specific stuff). This effort has been discussed but we never got time > to do that so far. Something on the lines which you are doing at [2] > > Another approach - With recent refactoring done in OAK-4566 my plan > was to plugin a ES based LuceneIndexWriter (ignore the name for now!) > and convert the Lucene Document to some ES Document counterpart. And > then provide just the query implementation. This would also allow to > reuse most of testcase we have in oak-lucene > > Chetan Mehrotra > [2] https://github.com/ieb/oak-es/blob/master/src/main/java/org/ > apache/jackrabbit/oak/plusing/index/es/index/take2/ > ESIndexEditorContext.java > > On Thu, Aug 11, 2016 at 3:40 PM, Chetan Mehrotra > <[email protected]> wrote: > > On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <[email protected]> wrote: > >> Both Solr Cloud and ES address this by sharding and > >> replicating the indexes, so that all commits are soft, instant and real > >> time. That introduces problems. > > ... > >> Both Solr Cloud and ES address this by sharding and > >> replicating the indexes, so that all commits are soft, instant and real > >> time. > > > > This would really be useful. However I have couple of aspects to clear > > > > Index Update Gurantee > > -------------------------------- > > > > Lets say if commit succeeds and then we update the index and index > > update fails for some reason. Then would that update be missed or > > there can be some mechanism to recover. I am not very sure about WAL > > here that may be the answer here but still confirming. > > > > In Oak with the way async index update works based on checkpoint its > > ensured that index would "eventually" contain the right data and no > > update would be lost. if there is a failure in index update then that > > would fail and next cycle would start again from same base state > > > > Order of index update > > ----------------------------- > > > > Lets say I have 2 cluster nodes where same node is being performed > > > > Original state /a {x:1} > > > > Cluster Node N1 - /a {x:1, y:2} > > Cluster Node N2 - /a {x:1, z:3} > > > > End State /a {x:1, y:2, z:3} > > > > At Oak level both the commits would succeed as there is no conflict. > > However N1 and N2 would not be seeing each other updates immediately > > and that would depend on background read. So in this case how would > > index update would look like. > > > > 1. Would index update for specific paths go to some master which would > > order the update > > 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3} > > > > Here current async index update logic ensures that it sees the > > eventually expected order of changes and hence would be consistent > > with repository state. > > > > Backup and Restore > > --------------------------- > > > > Would the backup now involve backup of ES index files from each > > cluster node. Or assuming full replication it would involve backup of > > files from any one of the nodes. Would the back be in sync with last > > changes done in repository (assuming sudden shutdown where changes got > > committed to repository but not yet to any index) > > > > Here current approach of storing index files as part of MVCC storage > > ensures that index state is consistent to some "checkpointed" state in > > repository. And post restart it would eventually catch up with the > > current repository state and hence would not require complete rebuild > > of index in case of unclean shutdowns > > > > > > Chetan Mehrotra > -- -Tor
