Re: Problems with re-index a huge repository

Marcel Reutegger Thu, 10 Aug 2006 00:12:18 -0700

The mergeFactor is way to high. With this setup index merging willonly take place after 1000 index segments have been created. That'salso the reason why there are so many directories in the index folder.The default value of 10 is usually a good choice and should only bechanged in rare cases.

Can you please try a re-index with a mergeFactor of 10 and if youstill run into an out of memory error file a jira issue?


Thanks

regards
 marcel

KÖLL Claus wrote:

i made some performance tests with a repository that has about 2 Million 
differend files (doc,xls, txt and ppt)
i am very satisfied with the performace ...
but now i made a test to re-index the whole repository to handle a scenario if 
there are some problems with the index at run time.
i have deleted the index folder an restart the repository

my test pc configuration (windows 2003/4gb ram/150Gb hard disk)i run always in a outofmemory exception while index creation at startup from the repository

i have set the /3Gb flag into the boot.ini to get more inital heap size

the current java start parameters are-Xms1550m -Xmx3000m

the workspace.xml file has these parameters
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="textFilterClasses"         
value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
    <param name="useCompoundFile" value="true" />
    <param name="minMergeDocs" value="1000" />
    <param name="mergeFactor" value="1000" />
    <param name="cacheSize" value="1000"/>
    <param name="respectDocumentOrder" value="false" />
    <param name="autoRepair" value="true"/>
    <param name="forceConsistencyCheck" value="false"/>
</SearchIndex>

for me its strange that during the index process lucene creates about 600 - 700 directories under theindex folder in the workspace directory and the redo.log is about 25Mb and then i get a outofmemoryexception.

at the time of initial filling of the repository the merge of the index 
folders/files works fine
but now it seems that the merger does not work.

if i restart the repository after the exception occurs the index folders/files 
will be merged into about 20-30 folders but
the repository is not indexed whole.

thanks for help

claus

Re: Problems with re-index a huge repository

Reply via email to