Hi all, I am experiencing serious out of memory errors when querying Nutch, and would appreciate any pointers or advice. I have a Nutch index that I'm searching using a simple servlet. This servlet queries the index and returns the results as XML, so other systems in my network can make use of the index as a web service.
In a nutshell, the problem seems to be that after successive queries to this servlet, the Tenured Gen increases until I run out of heap space. I am running Nutch-1.0, with the NUTCH-738 and NUTCH-746 patches applied (more about that below), Tomcat 6.0.20 and Sun's JVM, 1.6.0_12-b04 on Debian Lenny 32-bit. I have also tested with OpenJDK, and got the same results. My servlet just does the following : Configuration nutchConf = NutchConfiguration.create(); Path configPath = new Path(NUTCH_DIR + "/conf/" + site+ "/nutch-site.xml"); nutchConf.addResource(configPath); NutchBean nutchBean = new NutchBean(nutchConf); Query nutchQuery = Query.parse(nutchSearchString, nutchConf); Hits nutchHits = nutchBean.search(nutchQuery, maxResults); ... ... Format the results as XML and output them ... nutchBean.close(); After querying it a few hundred times, my Tenured Gen is up to 50Mb, after a few thousand requests, I end up with over 500Mb used. I can of course increase my heap size, but the problem is that no matter what I set it to, eventually it will all get consumed and the only option is to restart Tomcat. I have obtained a heap dump and run it through jhat, but to be honest I'm not really sure what I'm looking for. I've made the dump available at http://www.markround.com/static/tomcat.hprof, in case that helps anyone investigate further. For what it's worth, I didn't seem to get this issue with Nutch-0.9. Regarding the two patches I have applied - I had to make use of them as otherwise, I get a lot of threads in the TIMED_WAITING state, which according to Lambda Probe are stuck here : java.lang.Thread.sleep ( native code ) org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run ( FetchedSegments.java:115 ) With the 2 patches applied, I still get lots of these "stuck" threads, but they do seem to eventually get cleaned up; I wonder if this could have anything to do with the problem ? Please let me know if there are any other diagnostics I can run, or information I can provide. Many thanks, -Mark
