The mergeFactor is way to high. With this setup index merging will only take place after 1000 index segments have been created. That's also the reason why there are so many directories in the index folder. The default value of 10 is usually a good choice and should only be changed in rare cases.

Can you please try a re-index with a mergeFactor of 10 and if you still run into an out of memory error file a jira issue?

Thanks

regards
 marcel

KÖLL Claus wrote:
i made some performance tests with a repository that has about 2 Million 
differend files (doc,xls, txt and ppt)
i am very satisfied with the performace ...
but now i made a test to re-index the whole repository to handle a scenario if 
there are some problems with the index at run time.
i have deleted the index folder an restart the repository
my test pc configuration (windows 2003/4gb ram/150Gb hard disk) i run always in a outofmemory exception while index creation at startup from the repository
i have set the /3Gb flag into the boot.ini to get more inital heap size
the current java start parameters are -Xms1550m -Xmx3000m
the workspace.xml file has these parameters
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="textFilterClasses"         
value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
    <param name="useCompoundFile" value="true" />
    <param name="minMergeDocs" value="1000" />
    <param name="mergeFactor" value="1000" />
    <param name="cacheSize" value="1000"/>
    <param name="respectDocumentOrder" value="false" />
    <param name="autoRepair" value="true"/>
    <param name="forceConsistencyCheck" value="false"/>
</SearchIndex>

for me its strange that during the index process lucene creates about 600 - 700 directories under the index folder in the workspace directory and the redo.log is about 25Mb and then i get a outofmemoryexception.
at the time of initial filling of the repository the merge of the index 
folders/files works fine
but now it seems that the merger does not work.

if i restart the repository after the exception occurs the index folders/files 
will be merged into about 20-30 folders but
the repository is not indexed whole.

thanks for help

claus

Reply via email to