RE: Nutch 1.5 - "Error: Java heap space" during MAP step of CrawlDb update

Markus Jelsma Wed, 20 Jun 2012 15:39:37 -0700

The log you provide doesn't look like the actual mapper log. Can you check it 
out? The job has output for the main class but also separate logs for each map 
and reduce task.


 
 
-----Original message-----
> From:sidbatra <[email protected]>
> Sent: Wed 20-Jun-2012 20:29
> To: [email protected]
> Subject: Re: Nutch 1.5 - &quot;Error: Java heap space&quot; during MAP step 
> of CrawlDb update
> 
> thanks for the reply.
> 
> 
> The MAP tasks are the ones failing and most of them simply fail with:
> 
> attempt_201206200559_0032_m_000313_0 task_201206200559_0032_m_000313
> 10.76.89.196   FAILED
> Error: Java heap space
> 
> 
> Some of the MAP tasks have a trace as follows:
> 
> attempt_201206200559_0032_m_000322_1 task_201206200559_0032_m_000322
> 10.242.110.38 FAILED
> java.lang.Throwable: Child Error
>   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
> Caused by: java.io.IOException: Task process exit with nonzero status of
> 255.
>   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> 
> 
> and eventually after too many failures:
> 
> 12/06/20 10:53:21 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201206200559_0032_m_000434
> 
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1312)
>         at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:105)
>         at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:63)
>         at org.apache.nutch.crawl.Crawl.run(Crawl.java:140)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> 
> 
> Crawl settings:
> seed urls - 30
> topN - 1,000,000
> depth - 10 (Execution crashes at depth 6)
> 
> Cluster: 
> Amazon Elastic Map Reduce
> 
> Machines:
> type - c1.medium 
> number - 70
> 
> JAVA settings:
> HADOOP_JOBTRACKER_HEAPSIZE    768
> HADOOP_NAMENODE_HEAPSIZE      512
> HADOOP_TASKTRACKER_HEAPSIZE   256
> HADOOP_DATANODE_HEAPSIZE      128
> mapred.child.java.opts        -Xmx512m
> mapred.tasktracker.map.tasks.maximum  2
> mapred.tasktracker.reduce.tasks.maximum       1
> 
> 
> CrawlDB stats after the crash:
> 2/06/20 09:26:08 INFO mapred.JobClient:   CrawlDB status
> 12/06/20 09:26:08 INFO mapred.JobClient:     db_redir_temp=2117
> 12/06/20 09:26:08 INFO mapred.JobClient:     db_redir_perm=11542
> 12/06/20 09:26:08 INFO mapred.JobClient:     db_unfetched=2616086
> 12/06/20 09:26:08 INFO mapred.JobClient:     db_gone=2722
> 12/06/20 09:26:08 INFO mapred.JobClient:     db_fetched=238775
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-1-5-Error-Java-heap-space-during-MAP-step-of-CrawlDb-update-tp3990448p3990579.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

RE: Nutch 1.5 - "Error: Java heap space" during MAP step of CrawlDb update

Reply via email to