It is a good idea to turn off supportHighlighting especially if you aren't
using the functionality. It takes up a lot of extra space within the index.
I am not sure where you heard that the Lucene Index is kept in memory but I
am pretty certain that is wrong. Can you point me to the documentation
saying this?

Also what data set sizes are you querying against (10k nodes ? 100k nodes?
1 mil nodes?).
What heap size do you have set on the jvm?
Reducing the resultFetchSize should help reduce the memory footprint on
queries.
I am assuming you are using the QueryManager to retrieve nodes. Can you
give an example query that you are using?

I have developed a patch to improve query performance on large data sets
with jackrabbit 2.x. I should be done soon if I can gather together a few
hours to finish up my work. If you would like you can give that a try once
I finish.

Some other repository settings you might want to look at are:
 <PersistenceManager
class="org.apache.jackrabbit.core.persistence.pool.DerbyPersistenceManager">
      <param name="bundleCacheSize" value="256"/>
</PersistenceManager>
 <ISMLocking
class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>


Hope this helps.

On Mon, Nov 23, 2015 at 12:13 PM, Roll, Kevin <[email protected]> wrote:

> Our use case is the following: an external process generates 70 images,
> each around ~700k in size. These are uploaded as sub-nodes under a master
> node that encapsulates the run. There are are also some sister nodes that
> contain a modest amount of metadata about each image and the run that
> generated it. In general most of the writing consists of a client POSTing
> these images into the repository via Sling; there are then some event
> handlers and tasks that look at the data that arrived. The only subsequent
> writes at present are some properties that are set after these images are
> examined and replicated into another system. So, I don't expect much at all
> in the way of concurrent read/write; it's mainly write a bunch and then
> read it back later.
>
> By heavy pressure what I mean is that we have a test lab running
> continuously against this system. It's a lot more traffic than can be
> expected in the real world, but it is good for shaking out problems. What
> concerns me is that according to the documentation an entire Lucene index
> is kept in memory. Right now we don’t do any pruning - our repository only
> grows larger. This implies to me that the index will only grow as well and
> we will ultimately run out of memory no matter how big the heap is.
> Hopefully I'm wrong about that.
>
> At the moment we have no JVM flags set. The SearchIndex configuration is
> also default (by default I mean what came with Sling), although I am
> looking at turning off supportHighlighting and putting a small value for
> resultFetchSize, say 100.
>
> -----Original Message-----
> From: Ben Frisoni [mailto:[email protected]]
> Sent: Monday, November 23, 2015 11:55 AM
> To: [email protected]
> Subject: Re: Memory usage
>
> A little bit of description on the term heavy pressure might help? Does
> this involve concurrent read operations/ write operations or both?
>
> Also some other things that effect performance:
> 1. What jvm parameters are set?
> 2. Do you have any custom index configurations set?
> 3. What does you repostiory.xml look like?
>
> This background info might help with answering your question.
>
> On Mon, Nov 23, 2015 at 8:13 AM, Roll, Kevin <[email protected]> wrote:
>
> > We have started to encounter OutOfMemory errors on Jackrabbit under heavy
> > pressure (it's worth noting that we are using the full Sling stack). I've
> > discovered that Lucene keeps a full index of the repository in memory,
> and
> > this terrifies me because we are already having problems just in a test
> > scenario and the repository will only grow. Unfortunately we are forced
> to
> > run this system on older 32-bit hardware in the field that does not have
> > any room to expand memory-wise. Are there any options I can tweak to
> reduce
> > the memory footprint? Any other things I can disable that will cut down
> on
> > memory usage? Is Oak better in this regard? Thanks!
> >
> >
>

Reply via email to