Hi all,
I'm using Nutch 1.4 and Solr 3.6.1. The crawling is working well, it crawls
data, send to Solr perfectly. But the problem happens when the crawl is
failed sometimes, I get the java.lang.OutOfMemoryError: PermGen space right
after that. Here is the log file:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:138)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
at
jobs.cudo.GoldCoastCudoNutchCrawler.doJob(GoldCoastCudoNutchCrawler.java:23)
at play.jobs.Job.doJobWithResult(Job.java:50)
at play.jobs.Job.call(Job.java:146)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at jobs.CrawlerUtils.crawlJob(CrawlerUtils.java:15)
at
jobs.cudo.HobartCudoNutchCrawler.doJob(HobartCudoNutchCrawler.java:23)
at play.jobs.Job.doJobWithResult(Job.java:50)
at play.jobs.Job.call(Job.java:146)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Every time I get the "Job Failed!", I will get the problem with memory
right after that:
Exception in thread "TP-Processor8" Exception in thread "TP-Processor1"
Exception in thread "TP-Processor7" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor6" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor9" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor3" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor12" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor5" java.lang.OutOfMemoryError: PermGen
space
Exception in thread "TP-Processor10" java.lang.OutOfMemoryError: PermGen
space
Anyone knows this issue?
Many thanks,
Vu Pham