On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <i...@tfd.co.uk> wrote: > Both Solr Cloud and ES address this by sharding and > replicating the indexes, so that all commits are soft, instant and real > time. That introduces problems. ... > Both Solr Cloud and ES address this by sharding and > replicating the indexes, so that all commits are soft, instant and real > time.
This would really be useful. However I have couple of aspects to clear Index Update Gurantee -------------------------------- Lets say if commit succeeds and then we update the index and index update fails for some reason. Then would that update be missed or there can be some mechanism to recover. I am not very sure about WAL here that may be the answer here but still confirming. In Oak with the way async index update works based on checkpoint its ensured that index would "eventually" contain the right data and no update would be lost. if there is a failure in index update then that would fail and next cycle would start again from same base state Order of index update ----------------------------- Lets say I have 2 cluster nodes where same node is being performed Original state /a {x:1} Cluster Node N1 - /a {x:1, y:2} Cluster Node N2 - /a {x:1, z:3} End State /a {x:1, y:2, z:3} At Oak level both the commits would succeed as there is no conflict. However N1 and N2 would not be seeing each other updates immediately and that would depend on background read. So in this case how would index update would look like. 1. Would index update for specific paths go to some master which would order the update 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3} Here current async index update logic ensures that it sees the eventually expected order of changes and hence would be consistent with repository state. Backup and Restore --------------------------- Would the backup now involve backup of ES index files from each cluster node. Or assuming full replication it would involve backup of files from any one of the nodes. Would the back be in sync with last changes done in repository (assuming sudden shutdown where changes got committed to repository but not yet to any index) Here current approach of storing index files as part of MVCC storage ensures that index state is consistent to some "checkpointed" state in repository. And post restart it would eventually catch up with the current repository state and hence would not require complete rebuild of index in case of unclean shutdowns Chetan Mehrotra