AW: problems with re-indexing the workspace

KÖLL Claus Mon, 28 Aug 2006 05:12:00 -0700

hi jukka ..

in my case (as described in a other mail ... repository with 2 Million 
documents) i have these parameters for the lucene indexer ..
to re-index the workspace

<param name="minMergeDocs" value="1000"/>
<param name="maxMergeDocs" value="1000000"/>
<param name="mergeFactor" value="10"/> 

so my opinion is that the merger should work
if there are 10 index folders with 1000 nodes into a single index folder with 
1000*10 nodes
similarly when there are 10 index folders with each 1000*10 nodes and so on 
till we reach the maxMergeDocs size

if i re-index the repository it runs about 6-7 hours and then i get the 
outofmemory and about 700 index folders.
for me its not pursuable why there are so much index folders.
after the error occurs i restart again the repository without deleting the 
index folders and they get merged into about 40-90 folders

maybe there is a bug in the merger and he doesnt work right while the filter 
scans the documents on a re-index process
if i put some documents into the reposiory again the merger works great when i 
call the save Method on the session.

i dont know how i can get a usefully debug trace from the re-index process 
because after about 7 hours the log file is very large
if i enable the debug level on log4j

claus
-----Ursprüngliche Nachricht-----
Von: Jukka Zitting [mailto:[EMAIL PROTECTED] 
Gesendet: Montag, 28. August 2006 13:51
An: [email protected]
Betreff: Re: problems with re-indexing the workspace

Hi,

On 8/28/06, Christian Zanata <[EMAIL PROTECTED]> wrote:
> [ERROR] 20060825 17:06:40
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
>
> This error seems happening when the repository tries to re-index the
> workspace, but we don't have more stack traces.
> [...]
> could anybody heps us to understand what's happening?

There are two likely causes for that; either Lucene is running out of
memory while merging the index segments, or one of the index filters
runs out of memory trying to parse one of the binary documents in the
repository. Without a complete stack trace it is difficult to
determine the exact cause of the problems.

You might want to try modifying the Lucene parameters in the
SearchIndex configuration. See the Lucene documentation for options
that affect memory usage.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development

AW: problems with re-indexing the workspace

Reply via email to