Re: OakDirectory with cluster

Chetan Mehrotra Tue, 04 Aug 2015 06:48:14 -0700

Lucene indexing in Oak is done via a single thread in a cluster. The
AsyncIndex job which is responsible for performing the Lucene indexing
is scheduled to run as a singleton in the cluster [1]. For this Oak
depends on the embedding application. For e.g. when running in Apache
Sling the Job is registered with following properties


* scheduler.period=5
* scheduler.runOn=SINGLE

And Sling scheduler component then ensures that this job (i.e.
indexing) is only performed on a one node in a cluster. Oak internally
also ensures that at a time only one index is running via some state
management in :async hidden node. So essentially by design we enforce
that no concurrent indexing should happen and hence the use of
NoLockFactory
Chetan Mehrotra


On Tue, Aug 4, 2015 at 5:40 PM, Shinichiro Abe
<shinichiro.ab...@gmail.com> wrote:
> Hello,
>
> If I understand correctly, by OakDirectory, a Lucene index is placed as Blob 
> on a Storage.
> If there is an Oak cluster condition, Oak instances on the cluster will have 
> a LuceneIndexEditor individually in which IndexWriter is working.
> In this case when multiple IndexWriters update documents against one 
> OakDirectory on the same timing,
> I think it will lead to an index corruption even though OakDirectory controls 
> a lock with NoLockFactory.
> How are you avoiding this? Oak has a OakDirectory per cluster id? Oak impl 
> locks other writer behaviors during one update?
> I'd like to oak-lucene design. Where the code should I see?
> Background: I'd like to get a hint from Oak team for CONNECTORS-1219 where 
> I'm working on,
> I have cluster(multiprocess) condition problem. Currently I hit index 
> corruption when multiple IndexWriters write to an index.
>
> Thanks in advance.
> Shinichiro Abe

Re: OakDirectory with cluster

Reply via email to