The log you provide doesn't look like the actual mapper log. Can you check it out? The job has output for the main class but also separate logs for each map and reduce task.
-----Original message----- > From:sidbatra <[email protected]> > Sent: Wed 20-Jun-2012 20:29 > To: [email protected] > Subject: Re: Nutch 1.5 - "Error: Java heap space" during MAP step > of CrawlDb update > > thanks for the reply. > > > The MAP tasks are the ones failing and most of them simply fail with: > > attempt_201206200559_0032_m_000313_0 task_201206200559_0032_m_000313 > 10.76.89.196 FAILED > Error: Java heap space > > > Some of the MAP tasks have a trace as follows: > > attempt_201206200559_0032_m_000322_1 task_201206200559_0032_m_000322 > 10.242.110.38 FAILED > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > and eventually after too many failures: > > 12/06/20 10:53:21 INFO mapred.JobClient: Job Failed: # of failed Map Tasks > exceeded allowed limit. FailedCount: 1. LastFailedTask: > task_201206200559_0032_m_000434 > > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1312) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:105) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:63) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:140) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > Crawl settings: > seed urls - 30 > topN - 1,000,000 > depth - 10 (Execution crashes at depth 6) > > Cluster: > Amazon Elastic Map Reduce > > Machines: > type - c1.medium > number - 70 > > JAVA settings: > HADOOP_JOBTRACKER_HEAPSIZE 768 > HADOOP_NAMENODE_HEAPSIZE 512 > HADOOP_TASKTRACKER_HEAPSIZE 256 > HADOOP_DATANODE_HEAPSIZE 128 > mapred.child.java.opts -Xmx512m > mapred.tasktracker.map.tasks.maximum 2 > mapred.tasktracker.reduce.tasks.maximum 1 > > > CrawlDB stats after the crash: > 2/06/20 09:26:08 INFO mapred.JobClient: CrawlDB status > 12/06/20 09:26:08 INFO mapred.JobClient: db_redir_temp=2117 > 12/06/20 09:26:08 INFO mapred.JobClient: db_redir_perm=11542 > 12/06/20 09:26:08 INFO mapred.JobClient: db_unfetched=2616086 > 12/06/20 09:26:08 INFO mapred.JobClient: db_gone=2722 > 12/06/20 09:26:08 INFO mapred.JobClient: db_fetched=238775 > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-1-5-Error-Java-heap-space-during-MAP-step-of-CrawlDb-update-tp3990448p3990579.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

