The mergeFactor is way to high. With this setup index merging will
only take place after 1000 index segments have been created. That's
also the reason why there are so many directories in the index folder.
The default value of 10 is usually a good choice and should only be
changed in rare cases.
Can you please try a re-index with a mergeFactor of 10 and if you
still run into an out of memory error file a jira issue?
Thanks
regards
marcel
KÖLL Claus wrote:
i made some performance tests with a repository that has about 2 Million
differend files (doc,xls, txt and ppt)
i am very satisfied with the performace ...
but now i made a test to re-index the whole repository to handle a scenario if
there are some problems with the index at run time.
i have deleted the index folder an restart the repository
my test pc configuration (windows 2003/4gb ram/150Gb hard disk)
i run always in a outofmemory exception while index creation at startup from the repository
i have set the /3Gb flag into the boot.ini to get more inital heap size
the current java start parameters are
-Xms1550m -Xmx3000m
the workspace.xml file has these parameters
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses"
value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
<param name="useCompoundFile" value="true" />
<param name="minMergeDocs" value="1000" />
<param name="mergeFactor" value="1000" />
<param name="cacheSize" value="1000"/>
<param name="respectDocumentOrder" value="false" />
<param name="autoRepair" value="true"/>
<param name="forceConsistencyCheck" value="false"/>
</SearchIndex>
for me its strange that during the index process lucene creates about 600 - 700 directories under the
index folder in the workspace directory and the redo.log is about 25Mb and then i get a outofmemoryexception.
at the time of initial filling of the repository the merge of the index
folders/files works fine
but now it seems that the merger does not work.
if i restart the repository after the exception occurs the index folders/files
will be merged into about 20-30 folders but
the repository is not indexed whole.
thanks for help
claus