Hi,

keyword extraction of very large files will consume a lot of memory, cause all 
keywords have to be kept in memory (I`m not sure, if this is a Lucene issue or 
how its been used). For this you have three options:
- use all keywords, but live with the memory issue
- restrict the amount of keywords, but live with only half indexed files
- disable keyword extraction by using a index configuration for nt:resource 
where only a dummy non existing property should be indexed
Imho the second is the worst solution because it is not reliable.

Second time, I`ve seen more memory consumption was when lucene index files were 
merged. But I didn`t had the time to investigate here further, extending the 
memory a bit helped, so I don`t know about the cause here.

Kind regards, Robert

-----Ursprüngliche Nachricht-----
Von: pgupta [mailto:[email protected]] 
Gesendet: Freitag, 6. September 2013 05:36
An: [email protected]
Betreff: Re: Huge memory usage while re-indexing

Unfortunately not, as our users can potentially construct a search query using 
any property.

Do you think it's the number of indexable properties causing the memory issues? 
I was thinking it was perhaps more to do with the keyword extraction from file 
contents. We came across somewhat similar memory issue when we increased the 
number of words used for indexing from 10,000 to a million.
This again caused huge memory spike (~ 2GB) while importing a large text file 
(~ 100 MB). Because of this we had to revert this setting to the default value. 

So my initial thinking is that either Lucene indexing (or how it's being used 
by Jackrabbit) is not scalable, or our configuration is not optimal to handle 
these cases.



--
View this message in context: 
http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465p4659472.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Reply via email to