On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston <i...@tfd.co.uk> wrote:
> Both Solr Cloud and ES address this by sharding and
> replicating the indexes, so that all commits are soft, instant and real
> time. That introduces problems.
...
> Both Solr Cloud and ES address this by sharding and
> replicating the indexes, so that all commits are soft, instant and real
> time.

This would really be useful. However I have couple of aspects to clear

Index Update Gurantee
--------------------------------

Lets say if commit succeeds and then we update the index and index
update fails for some reason. Then would that update be missed or
there can be some mechanism to recover. I am not very sure about WAL
here that may be the answer here but still confirming.

In Oak with the way async index update works based on checkpoint its
ensured that index would "eventually" contain the right data and no
update would be lost. if there is a failure in index update then that
would fail and next cycle would start again from same base state

Order of index update
-----------------------------

Lets say I have 2 cluster nodes where same node is being performed

Original state /a {x:1}

Cluster Node N1 - /a {x:1, y:2}
Cluster Node N2 - /a {x:1, z:3}

End State /a {x:1, y:2, z:3}

At Oak level both the commits would succeed as there is no conflict.
However N1 and N2 would not be seeing each other updates immediately
and that would depend on background read. So in this case how would
index update would look like.

1. Would index update for specific paths go to some master which would
order the update
2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3}

Here current async index update logic ensures that it sees the
eventually expected order of changes and hence would be consistent
with repository state.

Backup and Restore
---------------------------

Would the backup now involve backup of ES index files from each
cluster node. Or assuming full replication it would involve backup of
files from any one of the nodes. Would the back be in sync with last
changes done in repository (assuming sudden shutdown where changes got
committed to repository but not yet to any index)

Here current approach of storing index files as part of MVCC storage
ensures that index state is consistent to some "checkpointed" state in
repository. And post restart it would eventually catch up with the
current repository state and hence would not require complete rebuild
of index in case of unclean shutdowns


Chetan Mehrotra

Reply via email to