* * *I suffer serious memory leak using Nutch 1.2 though a very deep crawl. I get the error like this:* * * * Exception in thread "Thread-113544" java.lang.OutOfMemoryError: PermGen space at java.lang.Throwable.getStackTraceElement(Native Method) at java.lang.Throwable.getOurStackTrace(Throwable.java:591) at java.lang.Throwable.printStackTrace(Throwable.java:510) at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:76) at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:407) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:305) at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359) at org.apache.log4j.WriterAppender.append(WriterAppender.java:160) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.java:509) at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:173) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1107) at org.apache.nutch.crawl.Crawl.main(Crawl.java:133)
I guess Plugin repository cache lead to memory leak. As u know plugins is stored in weakhashmap <conf, plugins>, and new class classload create when u need plugins. Usually,WeakHashMap object can been gc, but class and classload is stored in Perm NOT stack and gc can't perform in Perm, SO (java.lang.OutOfMemoryError: PermGen space) occured..., is any nutch-issues have concerned this promble? or there is any solution? NUTCH-501 <https://issues.apache.org/jira/browse/NUTCH-501> and NUTCH-356<https://issues.apache.org/jira/browse/NUTCH-356> may help thanks! *

