Hi everyone,
I was using Jackrabbit 1.5.6 version in my application since a long time. I
recently upgraded the jackrabbit version to 2.2.7. I am facing some serious
performance problems when *creating new node with attachment* in the
repository.
Here are the <search-index> part of *repository.xml* file used for above
versions of jackrabbit.
*For JR 1.5.6:*
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="supportHighlighting" value="true"/>
<param name="useCompoundFile" value="true"/>
<param name="minMergeDocs" value="100"/>
<param name="volatileIdleTime" value="3"/>
<param name="maxMergeDocs" value="2147483647"/>
<param name="mergeFactor" value="10"/>
<param name="maxFieldLength" value="2147483647"/>
<param name="bufferSize" value="10"/>
<param name="cacheSize" value="1000"/>
<param name="forceConsistencyCheck" value="false"/>
<param name="enableConsistencyCheck" value="false"/>
<param name="autoRepair" value="true"/>
<param name="analyzer"
value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<param name="queryClass"
value="org.apache.jackrabbit.core.query.QueryImpl"/>
<param name="respectDocumentOrder" value="false"/>
<param name="resultFetchSize" value="2147483647"/>
<param name="extractorPoolSize" value="0"/>
<param name="extractorTimeout" value="100"/>
<param name="extractorBackLogSize" value="100"/>
* <param name="textFilterClasses" value="*
* org.apache.jackrabbit.extractor.PlainTextExtractor,*
* com.xxx.dms.indexing.excel.MsExcelTextFilter,*
* org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,*
* com.xxx.dms.indexing.word.MsWordTextFilter,*
* com.xxx.dms.indexing.msg.afcl.domain.MSGTextFilter,*
* com.xxx.dms.indexing.pdf.AxPdfTextExtractor,*
* org.apache.jackrabbit.extractor.HTMLTextExtractor,*
* org.apache.jackrabbit.extractor.XMLTextExtractor,*
* org.apache.jackrabbit.extractor.RTFTextExtractor,*
* org.apache.jackrabbit.extractor.OpenOfficeTextExtractor*
* "/>*
</SearchIndex>
*For JR 2.2.7:*
<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="supportHighlighting" value="true"/>
<param name="useCompoundFile" value="true"/>
<param name="minMergeDocs" value="100"/>
<param name="volatileIdleTime" value="3"/>
<param name="maxMergeDocs" value="2147483647"/>
<param name="mergeFactor" value="10"/>
<param name="maxFieldLength" value="2147483647"/>
<param name="bufferSize" value="10"/>
<param name="cacheSize" value="1000"/>
<param name="forceConsistencyCheck" value="false"/>
<param name="enableConsistencyCheck" value="false"/>
<param name="autoRepair" value="true"/>
<param name="analyzer"
value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<param name="queryClass"
value="org.apache.jackrabbit.core.query.QueryImpl"/>
<param name="respectDocumentOrder" value="false"/>
<param name="resultFetchSize" value="2147483647"/>
<param name="extractorPoolSize" value="0"/>
<param name="extractorTimeout" value="100"/>
<param name="extractorBackLogSize" value="100"/>
<param name="maxExtractLength" value="10240000" />
*<!-- TIKA text extractor used by default. Is there something going wrong
with tika extractor? can I replace/configure tika extractor to use my custom
extractor classes for specific file types? -->*
</SearchIndex>
I have tested the above usecase with different scenarios against both of
these jackrabbit versions, with exactly the same environment. In my test
scenarios, I have included different file types as attachment, files with
different sizes, concurrent user requests etc.
Is there any open issue with jackrabbit 2.2.7 for performance or indexing?
Can I get these resolved by changing indexing mechanism or text-extractor?
Please guide to get this issue resolved.
Thanks & Regards,
--
Vishal Shukla