Rebuilding index

Nelson Takashi Omori Thu, 22 Nov 2012 10:54:42 -0800

Hi All,

I'm using Jackrabbit 2.4.3 and my repository has approximately 110thousand nodes. From these, about 10 thousand nodes has binary values,wich the content need to be extracted, using Tika, and indexed in Lucene.

I decided to delete the index to make Jackrabbit create them again. Theproblem is the time that this operation is taking. I waited for 3 hoursand the repository wasn't initialized (I don't know exactly how long ittake to complete the repository initialization, because I stopped theprocess). Disabling Tika's text extraction, it took 5 minutes, so Iconcluded that the problem is the time that Tika takes to extract the 10thousand documents.

If the index become inconsistent and I have to execute the rebuild, myclient doesn't want to wait for more than 3 hours to start using thesystem. So I'm planning to create a subclass oforg.apache.jackrabbit.core.query.lucene.SearchIndex and try to modifyhow the indexes are re-created. To give to my client a fast access tothe repository, first I'll ignore the text extraction and create theindex with normal properties. With this structure, I can give access tothe repository to my client and he can do many things using only thenormal properties. So, in background, I'll start the text extraction ofeach document and update Lucene's document with extracted value.


I have some questions about it.

1) Reading the source code, jackrabbit is using LazyTextExtractorField(and other classes) to execute the extraction in a separate thread.Doesn't it do exactly what I want? But, even so I waited 3 hours and therepository wasn't initialized and ready to use. Is it normal?2) What I'm planning to do is the best approach? Did anybody makesomething similar?


Thanks,

Nelson

Rebuilding index

Reply via email to