Ok, here are the syslogs from the individual machines. They all have a stack
trace similar to this
2012-06-21 00:28:40,838 WARN org.apache.hadoop.conf.Configuration (main):
DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml
is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml
to override properties of core-default.xml, mapred-default.xml and
hdfs-default.xml respectively
2012-06-21 00:28:41,826 INFO org.apache.hadoop.util.NativeCodeLoader (main):
Loaded the native-hadoop library
2012-06-21 00:28:42,043 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi
already exists!
2012-06-21 00:28:42,133 INFO org.apache.hadoop.mapred.MapTask (main): Host
name: ip-10-244-113-139.ec2.internal
2012-06-21 00:28:42,164 INFO org.apache.hadoop.util.ProcessTree (main):
setsid exited with exit code 0
2012-06-21 00:28:42,207 INFO org.apache.hadoop.mapred.Task (main): Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@17e4dee
2012-06-21 00:28:42,303 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory
(main): Successfully loaded & initialized native-zlib library
2012-06-21 00:28:42,303 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-21 00:28:42,313 INFO org.apache.hadoop.mapred.MapTask (main):
numReduceTasks: 105
2012-06-21 00:28:42,319 INFO org.apache.hadoop.mapred.MapTask (main):
io.sort.mb = 200
2012-06-21 00:28:43,081 INFO org.apache.hadoop.mapred.MapTask (main): data
buffer = 159383552/199229440
2012-06-21 00:28:43,082 INFO org.apache.hadoop.mapred.MapTask (main): record
buffer = 524288/655360
2012-06-21 00:28:43,102 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy
(main): Snappy native library is available
2012-06-21 00:28:43,102 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy
(main): Snappy native library loaded
2012-06-21 00:28:52,114 INFO org.apache.hadoop.mapred.MapTask (main):
Starting flush of map output
2012-06-21 00:28:52,289 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new compressor
2012-06-21 00:28:53,668 INFO org.apache.hadoop.mapred.MapTask (main):
Finished spill 0
2012-06-21 00:28:53,673 INFO org.apache.hadoop.mapred.Task (main):
Task:attempt_201206202314_0017_m_000055_0 is done. And is in the process of
commiting
2012-06-21 00:28:54,591 INFO org.apache.hadoop.mapred.Task (main): Task
'attempt_201206202314_0017_m_000055_0' done.
2012-06-21 00:28:54,594 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logs' truncater with mapRetainSize=-1 and
reduceRetainSize=-1
2012-06-21 00:28:54,707 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.
2012-06-21 00:28:54,707 INFO org.apache.hadoop.io.nativeio.NativeIO (main):
Got UserName hadoop for UID 106 from the native implementation
2012-06-21 00:28:55,760 INFO org.apache.hadoop.mapred.TaskLog (main):
Starting logging for a new task attempt_201206202314_0017_m_000114_0 in the
same JVM as that of the first task
/mnt/var/log/hadoop/userlogs/job_201206202314_0017/attempt_201206202314_0017_m_000055_0
2012-06-21 00:28:55,761 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): MapTask metrics
system already initialized!
2012-06-21 00:28:55,761 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name jvm
already exists!
2012-06-21 00:28:55,766 INFO org.apache.hadoop.mapred.MapTask (main): Host
name: ip-10-244-113-139.ec2.internal
2012-06-21 00:28:55,773 INFO org.apache.hadoop.mapred.Task (main): Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@bfed5a
2012-06-21 00:28:55,862 INFO org.apache.hadoop.mapred.MapTask (main):
numReduceTasks: 105
2012-06-21 00:28:55,863 INFO org.apache.hadoop.mapred.MapTask (main):
io.sort.mb = 200
2012-06-21 00:28:57,804 INFO org.apache.hadoop.mapred.MapTask (main): data
buffer = 159383552/199229440
2012-06-21 00:28:57,804 INFO org.apache.hadoop.mapred.MapTask (main): record
buffer = 524288/655360
2012-06-21 00:28:59,370 INFO org.apache.hadoop.mapred.MapTask (main):
Starting flush of map output
2012-06-21 00:28:59,452 INFO org.apache.hadoop.mapred.MapTask (main):
Finished spill 0
2012-06-21 00:28:59,455 INFO org.apache.hadoop.mapred.Task (main):
Task:attempt_201206202314_0017_m_000114_0 is done. And is in the process of
commiting
2012-06-21 00:29:01,818 INFO org.apache.hadoop.mapred.Task (main): Task
'attempt_201206202314_0017_m_000114_0' done.
2012-06-21 00:29:01,820 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logs' truncater with mapRetainSize=-1 and
reduceRetainSize=-1
2012-06-21 00:29:05,408 INFO org.apache.hadoop.mapred.TaskLog (main):
Starting logging for a new task attempt_201206202314_0017_m_000130_0 in the
same JVM as that of the first task
/mnt/var/log/hadoop/userlogs/job_201206202314_0017/attempt_201206202314_0017_m_000055_0
2012-06-21 00:29:05,409 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): MapTask metrics
system already initialized!
2012-06-21 00:29:05,409 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name jvm
already exists!
2012-06-21 00:29:05,412 INFO org.apache.hadoop.mapred.MapTask (main): Host
name: ip-10-244-113-139.ec2.internal
2012-06-21 00:29:05,419 INFO org.apache.hadoop.mapred.Task (main): Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@13fba1
2012-06-21 00:29:05,459 INFO org.apache.hadoop.mapred.MapTask (main):
numReduceTasks: 105
2012-06-21 00:29:05,460 INFO org.apache.hadoop.mapred.MapTask (main):
io.sort.mb = 200
2012-06-21 00:29:07,739 INFO org.apache.hadoop.mapred.TaskLogsTruncater
(main): Initializing logs' truncater with mapRetainSize=-1 and
reduceRetainSize=-1
2012-06-21 00:29:07,786 FATAL org.apache.hadoop.mapred.Child (main): Error
running child : java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Cleaner format on pastie:
Stack trace 1 - http://pastie.org/pastes/4123975/text
Stack trace 2 - http://pastie.org/pastes/4123976/text
I'll really appreciate some help on this. Please let me know if there are
any other logs that'll help debug this.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-1-5-Error-Java-heap-space-during-MAP-step-of-CrawlDb-update-tp3990448p3990627.html
Sent from the Nutch - User mailing list archive at Nabble.com.