Hi guys,
I'm writing a server / rest API for Nutch, but I'm running into a memory
leak issue.
I simplified the problem down to this: crawling a site repeatedly (as below)
will eventually run out of memory; when looking at the running JVM with
VisualVM, the permGen space grows indefinitely at the same rate, until it
runs out and the application crashes.
I suspect there is a memory leak in Nutch or in Hadoop, as I wouldn't expect
the code below not to grow its memory footprint indefinitely.
The code:
while (true) {
Configuration configuration = NutchConfiguration.create();
String crawlArg = "config/urls/dev -dir crawls/dev -threads 5 -depth 2
-topN 100 ";
ToolRunner.run(configuration, new Crawl(),
MiscUtils.tokenize(crawlArg));
}
Anything I can do on my side to fix this?
Thanks for all comments,
Yann
--
View this message in context:
http://lucene.472066.n3.nabble.com/Memory-leak-when-crawling-repeatedly-tp4106960.html
Sent from the Nutch - User mailing list archive at Nabble.com.