I am still working through the out-of-memory issue. The problem seems to be
identical to what I saw in November - a potentially unbounded query that eats
up memory. I thought that configuring a resultFetchSize in Jackrabbit had fixed
the issue, but apparently not, and I'm not sure that this parameter is having
any effect.
I'm now experimenting with using the QueryManager directly and setting a limit:
final Session session = resourceResolver.adaptTo(Session.class);
final QueryManager queryManager =
session.getWorkspace().getQueryManager();
final Query query = queryManager.createQuery(QUERY_STRING,
Query.JCR_SQL2);
query.setLimit(MAX_NODES_TO_PROCESS);
final RowIterator rowIterator = query.execute().getRows();
while (rowIterator.hasNext())
The query execution is still using more memory than I like (all I want is the
path!) but it appears to be stable. My question is whether the setLimit() is
actually passing that value to Lucene. I traced down into the Sling code, and
got lost in the lower levels, but as far as I could tell that value is pushed
downward. So, can anyone clarify if this will be an actual constraint on
Lucene? To put the question a different way, will Lucene use approximately the
same amount of memory to run my query no matter how large my repository gets?
What I am desperately trying to avoid is an unbounded query execution that will
eventually fail given a large enough repository.
From: Roll, Kevin
Sent: Wednesday, March 23, 2016 3:54 PM
To: '[email protected]' <[email protected]>
Subject: Out of memory during query
Back in November we had an out-of-memory problem with our Sling application. I
determined that a periodic task was executing a query that appeared to be
unlimited in terms of result set size, which would eat up memory as the
repository grew. In order to combat this I marked the nodes I am interested in
with a boolean flag, and I configured Jackrabbit to set the resultFetchSize to
100. This seemed to solve the problem and we had no further issues - until last
week, when the problem reappeared.
I've been able to determine that the problem is entirely in the execution of
this query. I can enter it from the JCR Explorer query window and it will cause
the runaway memory problem. The query is very straightforward, it is simply:
select * from [nt:base] where uploadToImageManagerFlag = true
I have no need for any parallel results, I simply want to examine the resultant
Resources one at a time. Deleting/rebuilding the Jackrabbit indexes did not
help.
Any ideas why this query might be causing runaway memory consumption? Looking
at a heap dump it appears that there are massive numbers of NodeId,
HashMap$Entry, NameSet, ChildNodeEntry, NodeState, etc. It seems that for
whatever reason a large number of nodes are being pulled into memory.
If this would make more sense on the Jackrabbit list I can ask over there as
well.
Thanks!