> https://github.com/ieb/oak-es
btw this looks interesting and something we can build upon. This can benefit from a refactoring of LuceneIndexEditor to separate the logic of interpreting the Oak indexing config during editor invocation from constructing Lucene document. If we decouple that logic then it would be possible to plugin in a ES Editor which just converts those properties per ES requirement. Hence it gets all benefits of aggregation, relative property implementation etc (which is very Oak specific stuff). This effort has been discussed but we never got time to do that so far. Something on the lines which you are doing at [2] Another approach - With recent refactoring done in OAK-4566 my plan was to plugin a ES based LuceneIndexWriter (ignore the name for now!) and convert the Lucene Document to some ES Document counterpart. And then provide just the query implementation. This would also allow to reuse most of testcase we have in oak-lucene Chetan Mehrotra [2] https://github.com/ieb/oak-es/blob/master/src/main/java/org/apache/jackrabbit/oak/plusing/index/es/index/take2/ESIndexEditorContext.java On Thu, Aug 11, 2016 at 3:40 PM, Chetan Mehrotra <[email protected]> wrote: > On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <[email protected]> wrote: >> Both Solr Cloud and ES address this by sharding and >> replicating the indexes, so that all commits are soft, instant and real >> time. That introduces problems. > ... >> Both Solr Cloud and ES address this by sharding and >> replicating the indexes, so that all commits are soft, instant and real >> time. > > This would really be useful. However I have couple of aspects to clear > > Index Update Gurantee > -------------------------------- > > Lets say if commit succeeds and then we update the index and index > update fails for some reason. Then would that update be missed or > there can be some mechanism to recover. I am not very sure about WAL > here that may be the answer here but still confirming. > > In Oak with the way async index update works based on checkpoint its > ensured that index would "eventually" contain the right data and no > update would be lost. if there is a failure in index update then that > would fail and next cycle would start again from same base state > > Order of index update > ----------------------------- > > Lets say I have 2 cluster nodes where same node is being performed > > Original state /a {x:1} > > Cluster Node N1 - /a {x:1, y:2} > Cluster Node N2 - /a {x:1, z:3} > > End State /a {x:1, y:2, z:3} > > At Oak level both the commits would succeed as there is no conflict. > However N1 and N2 would not be seeing each other updates immediately > and that would depend on background read. So in this case how would > index update would look like. > > 1. Would index update for specific paths go to some master which would > order the update > 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3} > > Here current async index update logic ensures that it sees the > eventually expected order of changes and hence would be consistent > with repository state. > > Backup and Restore > --------------------------- > > Would the backup now involve backup of ES index files from each > cluster node. Or assuming full replication it would involve backup of > files from any one of the nodes. Would the back be in sync with last > changes done in repository (assuming sudden shutdown where changes got > committed to repository but not yet to any index) > > Here current approach of storing index files as part of MVCC storage > ensures that index state is consistent to some "checkpointed" state in > repository. And post restart it would eventually catch up with the > current repository state and hence would not require complete rebuild > of index in case of unclean shutdowns > > > Chetan Mehrotra
