Hi,

On Wed, Nov 20, 2013 at 12:09 PM, Marcel Reutegger <[email protected]> wrote:
>> The async index update is designed so that it should work correctly
>> even when run concurrently on multiple cluster nodes. It uses
>> optimistic locking to prevent conflicting updates.
>
> I'm confident this works, but I'm a bit concerned about duplicate work
> being done. Doesn't the probability of concurrent updates increase
> with every cluster node we add?

The idea behind the design was that we'd make the update interval
dependent on the number of cluster nodes. I.e. instead of an interval
of t seconds that's independent of the cluster size, we'd configure
the index update interval to (roughly) n*t, where n is the size of the
cluster. That way an index update would would on average get triggered
once every t seconds across the cluster.

> A while ago I saw them more frequently. See
> https://issues.apache.org/jira/browse/OAK-1166 for more details.
> After https://issues.apache.org/jira/browse/OAK-1198 they are now
> less frequently.

OK, thanks for the pointers.

> But even if there are no warnings it may still mean
> unnecessary work is done and then discarded. Though, I understand
> most of the discarded lucene changes were already persisted to a
> branch and will have to be garbage collected. Is this correct?

Correct. I'd expect this to be a problem only during large imports
when big index updates are needed (and when the likelihood of
concurrent work is much increased). The flag I proposed in the earlier
message should help with such cases.

BR,

Jukka Zitting

Reply via email to