Is it possible for you to index only certain properties of a node? Only those that are relevant to your search, for example.
http://wiki.apache.org/jackrabbit/IndexingConfiguration On Thu, Sep 5, 2013 at 10:39 AM, pgupta <[email protected]> wrote: > Hi, > > We have a moderate sized repository with roughly the following size: > * Around 1M total objects > * Around 100K documents (PDFs, office docs, text, xml etc) > * Around 3TB of data in datastore (majority of which are non-indexable > binary files) > > Recently we had to re-index the repository as the search index got out of > sync with the rest of the data. During that we encountered out-of-memory > issue several times. We had to increase the heap size to 64GB before the > re-indexing finally finished. The total RAM taken up by the Java process > during re-indexing steadily climbed to 60GB and stayed there till the > indexing finished. > > We are using pretty standard search configuration as shown below: > > <SearchIndex > class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > > > > </SearchIndex> > > We tried playing with a few configuration settings such as > extractorPoolSize, maxMergeDocs etc without any appreciable impact on RAM > usage. > > Some questions that we have are: > 1) Is this high memory usage expected during indexing? > 2) Can we make any configuration change to manage it? > 3) Are there any improvements expected in Jackrabbit 3 (Project Oak)? > > Thanks, > Pankaj > > > > > > -- > View this message in context: > http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465.html > Sent from the Jackrabbit - Users mailing list archive at Nabble.com. > -- Cody Burleson Enterprise Web Architect, Base22 Mobile: +1 (214) 537-8782 Skype: codyburleson Email: [email protected] Blog: codyburleson.com * <http://base22.com>* * * *Check my free/busy time.<http://www.google.com/calendar/embed?src=cody.burleson%40base22.com&ctz=America/Chicago%20> *
