A couple of observations. 1. The query you created is amazingly broad with the usage of nt:base and no restricitions such as path. If you're going to create a query the more restrictive you can make the query the better.
2. Not sure if you're using JCR or OAK. If you're using oak, be sure to index on the property. 3. Queries are generally slow. In the most counter intuitive experience I've ever had. We've discovered it is far faster to manually descend through the resources and identify the items that you are searching for than any query we've created. -----Original Message----- From: Roll, Kevin [mailto:[email protected]] Sent: Friday, March 25, 2016 12:46 PM To: [email protected] Subject: RE: Out of memory during query I am still working through the out-of-memory issue. The problem seems to be identical to what I saw in November - a potentially unbounded query that eats up memory. I thought that configuring a resultFetchSize in Jackrabbit had fixed the issue, but apparently not, and I'm not sure that this parameter is having any effect. I'm now experimenting with using the QueryManager directly and setting a limit: final Session session = resourceResolver.adaptTo(Session.class); final QueryManager queryManager = session.getWorkspace().getQueryManager(); final Query query = queryManager.createQuery(QUERY_STRING, Query.JCR_SQL2); query.setLimit(MAX_NODES_TO_PROCESS); final RowIterator rowIterator = query.execute().getRows(); while (rowIterator.hasNext()) The query execution is still using more memory than I like (all I want is the path!) but it appears to be stable. My question is whether the setLimit() is actually passing that value to Lucene. I traced down into the Sling code, and got lost in the lower levels, but as far as I could tell that value is pushed downward. So, can anyone clarify if this will be an actual constraint on Lucene? To put the question a different way, will Lucene use approximately the same amount of memory to run my query no matter how large my repository gets? What I am desperately trying to avoid is an unbounded query execution that will eventually fail given a large enough repository. From: Roll, Kevin Sent: Wednesday, March 23, 2016 3:54 PM To: '[email protected]' <[email protected]> Subject: Out of memory during query Back in November we had an out-of-memory problem with our Sling application. I determined that a periodic task was executing a query that appeared to be unlimited in terms of result set size, which would eat up memory as the repository grew. In order to combat this I marked the nodes I am interested in with a boolean flag, and I configured Jackrabbit to set the resultFetchSize to 100. This seemed to solve the problem and we had no further issues - until last week, when the problem reappeared. I've been able to determine that the problem is entirely in the execution of this query. I can enter it from the JCR Explorer query window and it will cause the runaway memory problem. The query is very straightforward, it is simply: select * from [nt:base] where uploadToImageManagerFlag = true I have no need for any parallel results, I simply want to examine the resultant Resources one at a time. Deleting/rebuilding the Jackrabbit indexes did not help. Any ideas why this query might be causing runaway memory consumption? Looking at a heap dump it appears that there are massive numbers of NodeId, HashMap$Entry, NameSet, ChildNodeEntry, NodeState, etc. It seems that for whatever reason a large number of nodes are being pulled into memory. If this would make more sense on the Jackrabbit list I can ask over there as well. Thanks!
