Hi guys,

I'm writing a server / rest API for Nutch, but I'm running into a memory
leak issue.

I simplified the problem down to this: crawling a site repeatedly (as below)
will eventually run out of memory; when looking at the running JVM with
VisualVM, the permGen space grows indefinitely at the same rate, until it
runs out and the application crashes.

I suspect there is a memory leak in Nutch or in Hadoop, as I wouldn't expect
the code below not to grow its memory footprint indefinitely.

The code:

while (true) {
    Configuration configuration = NutchConfiguration.create();
     String crawlArg = "config/urls/dev -dir crawls/dev -threads 5 -depth 2
-topN 100 ";
     ToolRunner.run(configuration, new Crawl(),
MiscUtils.tokenize(crawlArg));
}

Anything I can do on my side to fix this?

Thanks for all comments,

Yann




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-leak-when-crawling-repeatedly-tp4106960.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to