>mapred.child.java.opts -Xmx512m Looking at your logs it seems 512mb might be too small for the task. You could try a few things: - Increase the heap of the child (at the expensive of running fewer per node, depending on how much ram you have). - Decrease io.sort.mb --> I think 200m is too much for just 512m of heap.
On Thu, Jun 21, 2012 at 4:15 AM, sidbatra <[email protected]> wrote: > Ok, here are the syslogs from the individual machines. They all have a > stack > trace similar to this > > > > 2012-06-21 00:28:40,838 WARN org.apache.hadoop.conf.Configuration (main): > DEPRECATED: hadoop-site.xml found in the classpath. Usage of > hadoop-site.xml > is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml > to override properties of core-default.xml, mapred-default.xml and > hdfs-default.xml respectively > 2012-06-21 00:28:41,826 INFO org.apache.hadoop.util.NativeCodeLoader > (main): > Loaded the native-hadoop library > 2012-06-21 00:28:42,043 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi > already exists! > 2012-06-21 00:28:42,133 INFO org.apache.hadoop.mapred.MapTask (main): Host > name: ip-10-244-113-139.ec2.internal > 2012-06-21 00:28:42,164 INFO org.apache.hadoop.util.ProcessTree (main): > setsid exited with exit code 0 > 2012-06-21 00:28:42,207 INFO org.apache.hadoop.mapred.Task (main): Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@17e4dee > 2012-06-21 00:28:42,303 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory > (main): Successfully loaded & initialized native-zlib library > 2012-06-21 00:28:42,303 INFO org.apache.hadoop.io.compress.CodecPool > (main): > Got brand-new decompressor > 2012-06-21 00:28:42,313 INFO org.apache.hadoop.mapred.MapTask (main): > numReduceTasks: 105 > 2012-06-21 00:28:42,319 INFO org.apache.hadoop.mapred.MapTask (main): > io.sort.mb = 200 > 2012-06-21 00:28:43,081 INFO org.apache.hadoop.mapred.MapTask (main): data > buffer = 159383552/199229440 > 2012-06-21 00:28:43,082 INFO org.apache.hadoop.mapred.MapTask (main): > record > buffer = 524288/655360 > 2012-06-21 00:28:43,102 WARN > org.apache.hadoop.io.compress.snappy.LoadSnappy > (main): Snappy native library is available > 2012-06-21 00:28:43,102 INFO > org.apache.hadoop.io.compress.snappy.LoadSnappy > (main): Snappy native library loaded > 2012-06-21 00:28:52,114 INFO org.apache.hadoop.mapred.MapTask (main): > Starting flush of map output > 2012-06-21 00:28:52,289 INFO org.apache.hadoop.io.compress.CodecPool > (main): > Got brand-new compressor > 2012-06-21 00:28:53,668 INFO org.apache.hadoop.mapred.MapTask (main): > Finished spill 0 > 2012-06-21 00:28:53,673 INFO org.apache.hadoop.mapred.Task (main): > Task:attempt_201206202314_0017_m_000055_0 is done. And is in the process of > commiting > 2012-06-21 00:28:54,591 INFO org.apache.hadoop.mapred.Task (main): Task > 'attempt_201206202314_0017_m_000055_0' done. > 2012-06-21 00:28:54,594 INFO org.apache.hadoop.mapred.TaskLogsTruncater > (main): Initializing logs' truncater with mapRetainSize=-1 and > reduceRetainSize=-1 > 2012-06-21 00:28:54,707 INFO org.apache.hadoop.io.nativeio.NativeIO (main): > Initialized cache for UID to User mapping with a cache timeout of 14400 > seconds. > 2012-06-21 00:28:54,707 INFO org.apache.hadoop.io.nativeio.NativeIO (main): > Got UserName hadoop for UID 106 from the native implementation > 2012-06-21 00:28:55,760 INFO org.apache.hadoop.mapred.TaskLog (main): > Starting logging for a new task attempt_201206202314_0017_m_000114_0 in the > same JVM as that of the first task > > /mnt/var/log/hadoop/userlogs/job_201206202314_0017/attempt_201206202314_0017_m_000055_0 > 2012-06-21 00:28:55,761 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): MapTask metrics > system already initialized! > 2012-06-21 00:28:55,761 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name jvm > already exists! > 2012-06-21 00:28:55,766 INFO org.apache.hadoop.mapred.MapTask (main): Host > name: ip-10-244-113-139.ec2.internal > 2012-06-21 00:28:55,773 INFO org.apache.hadoop.mapred.Task (main): Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@bfed5a > 2012-06-21 00:28:55,862 INFO org.apache.hadoop.mapred.MapTask (main): > numReduceTasks: 105 > 2012-06-21 00:28:55,863 INFO org.apache.hadoop.mapred.MapTask (main): > io.sort.mb = 200 > 2012-06-21 00:28:57,804 INFO org.apache.hadoop.mapred.MapTask (main): data > buffer = 159383552/199229440 > 2012-06-21 00:28:57,804 INFO org.apache.hadoop.mapred.MapTask (main): > record > buffer = 524288/655360 > 2012-06-21 00:28:59,370 INFO org.apache.hadoop.mapred.MapTask (main): > Starting flush of map output > 2012-06-21 00:28:59,452 INFO org.apache.hadoop.mapred.MapTask (main): > Finished spill 0 > 2012-06-21 00:28:59,455 INFO org.apache.hadoop.mapred.Task (main): > Task:attempt_201206202314_0017_m_000114_0 is done. And is in the process of > commiting > 2012-06-21 00:29:01,818 INFO org.apache.hadoop.mapred.Task (main): Task > 'attempt_201206202314_0017_m_000114_0' done. > 2012-06-21 00:29:01,820 INFO org.apache.hadoop.mapred.TaskLogsTruncater > (main): Initializing logs' truncater with mapRetainSize=-1 and > reduceRetainSize=-1 > 2012-06-21 00:29:05,408 INFO org.apache.hadoop.mapred.TaskLog (main): > Starting logging for a new task attempt_201206202314_0017_m_000130_0 in the > same JVM as that of the first task > > /mnt/var/log/hadoop/userlogs/job_201206202314_0017/attempt_201206202314_0017_m_000055_0 > 2012-06-21 00:29:05,409 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): MapTask metrics > system already initialized! > 2012-06-21 00:29:05,409 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name jvm > already exists! > 2012-06-21 00:29:05,412 INFO org.apache.hadoop.mapred.MapTask (main): Host > name: ip-10-244-113-139.ec2.internal > 2012-06-21 00:29:05,419 INFO org.apache.hadoop.mapred.Task (main): Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@13fba1 > 2012-06-21 00:29:05,459 INFO org.apache.hadoop.mapred.MapTask (main): > numReduceTasks: 105 > 2012-06-21 00:29:05,460 INFO org.apache.hadoop.mapred.MapTask (main): > io.sort.mb = 200 > 2012-06-21 00:29:07,739 INFO org.apache.hadoop.mapred.TaskLogsTruncater > (main): Initializing logs' truncater with mapRetainSize=-1 and > reduceRetainSize=-1 > 2012-06-21 00:29:07,786 FATAL org.apache.hadoop.mapred.Child (main): Error > running child : java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > > Cleaner format on pastie: > Stack trace 1 - http://pastie.org/pastes/4123975/text > Stack trace 2 - http://pastie.org/pastes/4123976/text > > > I'll really appreciate some help on this. Please let me know if there are > any other logs that'll help debug this. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-1-5-Error-Java-heap-space-during-MAP-step-of-CrawlDb-update-tp3990448p3990627.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

