Hi, On Wed, Nov 20, 2013 at 12:09 PM, Marcel Reutegger <[email protected]> wrote: >> The async index update is designed so that it should work correctly >> even when run concurrently on multiple cluster nodes. It uses >> optimistic locking to prevent conflicting updates. > > I'm confident this works, but I'm a bit concerned about duplicate work > being done. Doesn't the probability of concurrent updates increase > with every cluster node we add?
The idea behind the design was that we'd make the update interval dependent on the number of cluster nodes. I.e. instead of an interval of t seconds that's independent of the cluster size, we'd configure the index update interval to (roughly) n*t, where n is the size of the cluster. That way an index update would would on average get triggered once every t seconds across the cluster. > A while ago I saw them more frequently. See > https://issues.apache.org/jira/browse/OAK-1166 for more details. > After https://issues.apache.org/jira/browse/OAK-1198 they are now > less frequently. OK, thanks for the pointers. > But even if there are no warnings it may still mean > unnecessary work is done and then discarded. Though, I understand > most of the discarded lucene changes were already persisted to a > branch and will have to be garbage collected. Is this correct? Correct. I'd expect this to be a problem only during large imports when big index updates are needed (and when the likelihood of concurrent work is much increased). The flag I proposed in the earlier message should help with such cases. BR, Jukka Zitting
