Hi Feng,

It's not only happens in inject processing but also the parsing processing.
Below are some Job Failed which I get and after that I get the PermGen
issue:

java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:138)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
    at
jobs.cudo.GoldCoastCudoNutchCrawler.doJob(GoldCoastCudoNutchCrawler.java:23)
    at play.jobs.Job.doJobWithResult(Job.java:50)
    at play.jobs.Job.call(Job.java:146)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)




java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
    at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
    at
jobs.cudo.BrisbaneCudoNutchCrawler.doJob(BrisbaneCudoNutchCrawler.java:23)
    at play.jobs.Job.doJobWithResult(Job.java:50)
    at play.jobs.Job.call(Job.java:146)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)




java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
    at
jobs.cudo.HobartCudoNutchCrawler.doJob(HobartCudoNutchCrawler.java:23)
    at play.jobs.Job.doJobWithResult(Job.java:50)
    at play.jobs.Job.call(Job.java:146)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)


Many thanks,
Vu


On Wed, Mar 20, 2013 at 2:13 PM, feng lu <[email protected]> wrote:

> Hi Deals
>
> <<
> - Can I add -XX:MaxPermSize into JAVA_OPTS instead?
> >>
> yes, you can add this into JAVA_OPTS, but i can not find this param in
> bin/nutch script file.
>
> <<
> -  Does Garbage Collector clean the stuff in PermGen Space? If it doesn't
> clean the stuff in PermGen and Nutch keeps adding the stuff into the
> PermGen Space, it will have the PermGen issue again I guess (with the
> bigger size) ?
>
> >>
> GC can not clean the PermGen Space. so if you application load a lot of
> Class. it will cause PermGen space error.
>
> i see that this problem also happens in the inject processing. But i don't
> find any signs cause PermGen space error in this processing. It just takes
> a flat file of URLs and adds them to the of pages to be crawled. There may
> be additional factors to cause this problem.
>
>
>
>
> On Wed, Mar 20, 2013 at 10:45 AM, Deals Collect <[email protected]
> >wrote:
>
> > Hi Feng,
> >
> > Thanks for your reply. I'm going to add these stuff into the OPTS:
> > -XX:MaxPermSize, -XX:+CMSPermGenSweepingEnabled and
> > -XX:+CMSClassUnloadingEnabled.
> > I'm still not clear about something:
> >
> > - Can I add -XX:MaxPermSize into JAVA_OPTS instead?
> > -  Does Garbage Collector clean the stuff in PermGen Space? If it doesn't
> > clean the stuff in PermGen and Nutch keeps adding the stuff into the
> > PermGen Space, it will have the PermGen issue again I guess (with the
> > bigger size) ?
> >
> > Many thanks,
> > Vu
> >
> >
> >
> > On Wed, Mar 20, 2013 at 1:20 PM, feng lu <[email protected]> wrote:
> >
> > > Hi Deals
> > >
> > > you can edit the bin/nutch script file and increase the Permanent
> > > Generation Space to NUTCH_OPTS param. some code like this
> > >
> > > JAVA=$JAVA_HOME/bin/java
> > > JAVA_HEAP_MAX=-Xmx1000m
> > > NUTCH_OPTS="-XX:PermSize=128M -XX:MaxPermSize=256m"
> > >
> > >
> > > On Wed, Mar 20, 2013 at 9:31 AM, Deals Collect <[email protected]
> > > >wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm using Nutch 1.4 and Solr 3.6.1. The crawling is working well, it
> > > crawls
> > > > data, send to Solr perfectly. But the problem happens when the crawl
> is
> > > > failed sometimes, I get the java.lang.OutOfMemoryError: PermGen space
> > > right
> > > > after that. Here is the log file:
> > > >
> > > > java.io.IOException: Job failed!
> > > >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > > >     at
> org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
> > > >     at org.apache.nutch.crawl.Crawl.run(Crawl.java:138)
> > > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >     at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
> > > >     at
> > > >
> > > >
> > >
> >
> jobs.cudo.GoldCoastCudoNutchCrawler.doJob(GoldCoastCudoNutchCrawler.java:23)
> > > >     at play.jobs.Job.doJobWithResult(Job.java:50)
> > > >     at play.jobs.Job.call(Job.java:146)
> > > >     at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > >
> > > > java.io.IOException: Job failed!
> > > >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
> > > >     at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
> > > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >     at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
> > > >     at
> > > >
> jobs.cudo.HobartCudoNutchCrawler.doJob(HobartCudoNutchCrawler.java:23)
> > > >     at play.jobs.Job.doJobWithResult(Job.java:50)
> > > >     at play.jobs.Job.call(Job.java:146)
> > > >     at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >     at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > >
> > > > Every time I get the "Job Failed!", I will get the problem with
> memory
> > > > right after that:
> > > >
> > > > Exception in thread "TP-Processor8" Exception in thread
> "TP-Processor1"
> > > > Exception in thread "TP-Processor7" java.lang.OutOfMemoryError:
> PermGen
> > > > space
> > > > Exception in thread "TP-Processor6" java.lang.OutOfMemoryError:
> PermGen
> > > > space
> > > > Exception in thread "TP-Processor9" java.lang.OutOfMemoryError:
> PermGen
> > > > space
> > > > Exception in thread "TP-Processor3" java.lang.OutOfMemoryError:
> PermGen
> > > > space
> > > > Exception in thread "TP-Processor12" java.lang.OutOfMemoryError:
> > PermGen
> > > > space
> > > > Exception in thread "TP-Processor5" java.lang.OutOfMemoryError:
> PermGen
> > > > space
> > > > Exception in thread "TP-Processor10" java.lang.OutOfMemoryError:
> > PermGen
> > > > space
> > > >
> > > > Anyone knows this issue?
> > > >
> > > > Many thanks,
> > > > Vu Pham
> > > >
> > >
> > >
> > >
> > > --
> > > Don't Grow Old, Grow Up... :-)
> > >
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Reply via email to