Hi all,

I'm using Nutch 1.4 and Solr 3.6.1. The crawling is working well, it crawls
data, send to Solr perfectly. But the problem happens when the crawl is
failed sometimes, I get the java.lang.OutOfMemoryError: PermGen space right
after that. Here is the log file:

java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:138)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
    at
jobs.cudo.GoldCoastCudoNutchCrawler.doJob(GoldCoastCudoNutchCrawler.java:23)
    at play.jobs.Job.doJobWithResult(Job.java:50)
    at play.jobs.Job.call(Job.java:146)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
    at
jobs.cudo.HobartCudoNutchCrawler.doJob(HobartCudoNutchCrawler.java:23)
    at play.jobs.Job.doJobWithResult(Job.java:50)
    at play.jobs.Job.call(Job.java:146)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

Every time I get the "Job Failed!", I will get the problem with memory
right after that:

Exception in thread "TP-Processor8" Exception in thread "TP-Processor1"
Exception in thread "TP-Processor7" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor6" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor9" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor3" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor12" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor5" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor10" java.lang.OutOfMemoryError: PermGen
space

Anyone knows this issue?

Many thanks,
Vu Pham

Reply via email to