Hi Deals << - Can I add -XX:MaxPermSize into JAVA_OPTS instead? >> yes, you can add this into JAVA_OPTS, but i can not find this param in bin/nutch script file.
<< - Does Garbage Collector clean the stuff in PermGen Space? If it doesn't clean the stuff in PermGen and Nutch keeps adding the stuff into the PermGen Space, it will have the PermGen issue again I guess (with the bigger size) ? >> GC can not clean the PermGen Space. so if you application load a lot of Class. it will cause PermGen space error. i see that this problem also happens in the inject processing. But i don't find any signs cause PermGen space error in this processing. It just takes a flat file of URLs and adds them to the of pages to be crawled. There may be additional factors to cause this problem. On Wed, Mar 20, 2013 at 10:45 AM, Deals Collect <[email protected]>wrote: > Hi Feng, > > Thanks for your reply. I'm going to add these stuff into the OPTS: > -XX:MaxPermSize, -XX:+CMSPermGenSweepingEnabled and > -XX:+CMSClassUnloadingEnabled. > I'm still not clear about something: > > - Can I add -XX:MaxPermSize into JAVA_OPTS instead? > - Does Garbage Collector clean the stuff in PermGen Space? If it doesn't > clean the stuff in PermGen and Nutch keeps adding the stuff into the > PermGen Space, it will have the PermGen issue again I guess (with the > bigger size) ? > > Many thanks, > Vu > > > > On Wed, Mar 20, 2013 at 1:20 PM, feng lu <[email protected]> wrote: > > > Hi Deals > > > > you can edit the bin/nutch script file and increase the Permanent > > Generation Space to NUTCH_OPTS param. some code like this > > > > JAVA=$JAVA_HOME/bin/java > > JAVA_HEAP_MAX=-Xmx1000m > > NUTCH_OPTS="-XX:PermSize=128M -XX:MaxPermSize=256m" > > > > > > On Wed, Mar 20, 2013 at 9:31 AM, Deals Collect <[email protected] > > >wrote: > > > > > Hi all, > > > > > > I'm using Nutch 1.4 and Solr 3.6.1. The crawling is working well, it > > crawls > > > data, send to Solr perfectly. But the problem happens when the crawl is > > > failed sometimes, I get the java.lang.OutOfMemoryError: PermGen space > > right > > > after that. Here is the log file: > > > > > > java.io.IOException: Job failed! > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > > at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157) > > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:138) > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15) > > > at > > > > > > > > > jobs.cudo.GoldCoastCudoNutchCrawler.doJob(GoldCoastCudoNutchCrawler.java:23) > > > at play.jobs.Job.doJobWithResult(Job.java:50) > > > at play.jobs.Job.call(Job.java:146) > > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > java.io.IOException: Job failed! > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:217) > > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:127) > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15) > > > at > > > jobs.cudo.HobartCudoNutchCrawler.doJob(HobartCudoNutchCrawler.java:23) > > > at play.jobs.Job.doJobWithResult(Job.java:50) > > > at play.jobs.Job.call(Job.java:146) > > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > > at > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > at > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > Every time I get the "Job Failed!", I will get the problem with memory > > > right after that: > > > > > > Exception in thread "TP-Processor8" Exception in thread "TP-Processor1" > > > Exception in thread "TP-Processor7" java.lang.OutOfMemoryError: PermGen > > > space > > > Exception in thread "TP-Processor6" java.lang.OutOfMemoryError: PermGen > > > space > > > Exception in thread "TP-Processor9" java.lang.OutOfMemoryError: PermGen > > > space > > > Exception in thread "TP-Processor3" java.lang.OutOfMemoryError: PermGen > > > space > > > Exception in thread "TP-Processor12" java.lang.OutOfMemoryError: > PermGen > > > space > > > Exception in thread "TP-Processor5" java.lang.OutOfMemoryError: PermGen > > > space > > > Exception in thread "TP-Processor10" java.lang.OutOfMemoryError: > PermGen > > > space > > > > > > Anyone knows this issue? > > > > > > Many thanks, > > > Vu Pham > > > > > > > > > > > -- > > Don't Grow Old, Grow Up... :-) > > > -- Don't Grow Old, Grow Up... :-)

