subject:"RE\: Stack Overflow When Running Job"

Re: Stack Overflow When Running Job

2008-06-09 Thread Colin Freas

We were getting this exact same problem in a really simple MR job, on input
produced from a known-working MR job.

It seemed to happen intermittently, and we couldn't figure out what was up.
In the end we solved the problem by increasing the number of maps (80 to
200, this is a 6 node, 12 code cluster).  Apparently, QuickSort can have
problems with big chunks of pre-sorted data.  Too much recursion, I believe.

This might not be what's going on with you, maybe you're on a cluster of
some other scale, but this worked for us (and in a setup with Hadoop 0.17.)

Good luck!

-Colin

On Mon, Jun 2, 2008 at 3:18 PM, Devaraj Das [EMAIL PROTECTED] wrote:

 Hi, do you have a testcase that we can run to reproduce this? Thanks!

  -Original Message-
  From: jkupferman [mailto:[EMAIL PROTECTED]
  Sent: Monday, June 02, 2008 9:22 AM
  To: core-user@hadoop.apache.org
  Subject: Stack Overflow When Running Job
 
 
  Hi everyone,
  I have a job running that keeps failing with Stack Overflows
  and I really dont see how that is happening.
  The job runs for about 20-30 minutes before one task errors,
  then a few more error and it fails.
  I am running hadoop-17 and ive tried lowering these settings
  to no avail:
  io.sort.factor50
  io.seqfile.sorter.recordlimit 50
 
  java.io.IOException: Spill failed
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
  MapTask.java:594)
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
  MapTask.java:576)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at Group.write(Group.java:68)
at GroupPair.write(GroupPair.java:67)
at
  org.apache.hadoop.io.serializer.WritableSerialization$Writable
 Serializer.serialize(WritableSerialization.java:90)
at
  org.apache.hadoop.io.serializer.WritableSerialization$Writable
 Serializer.serialize(WritableSerialization.java:77)
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTa
  sk.java:434)
at MyMapper.map(MyMapper.java:27)
at MyMapper.map(MyMapper.java:10)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
  org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  Caused by: java.lang.StackOverflowError
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at Group.readFields(Group.java:62)
at GroupPair.readFields(GroupPair.java:60)
at
  org.apache.hadoop.io.WritableComparator.compare(WritableCompar
  ator.java:91)
at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTa
  sk.java:494)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
  the above line repeated 200x
 
  I defined writeablecomparable called GroupPair which simply
  holds to Group objects, each of which contains two integers.
  I fail to see how QuickSort could recurse 200+ times since
  that would require an insanely large amount of entries , far
  more then the 500 million that had been output at that point.
 
  How is this even possible? And what can be done to fix this?
  --
  View this message in context:
  http://www.nabble.com/Stack-Overflow-When-Running-Job-tp175935
  94p17593594.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.

RE: Stack Overflow When Running Job

2008-06-02 Thread Devaraj Das

Hi, do you have a testcase that we can run to reproduce this? Thanks!

 -Original Message-
 From: jkupferman [mailto:[EMAIL PROTECTED] 
 Sent: Monday, June 02, 2008 9:22 AM
 To: core-user@hadoop.apache.org
 Subject: Stack Overflow When Running Job
 
 
 Hi everyone,
 I have a job running that keeps failing with Stack Overflows 
 and I really dont see how that is happening.
 The job runs for about 20-30 minutes before one task errors, 
 then a few more error and it fails.
 I am running hadoop-17 and ive tried lowering these settings 
 to no avail:
 io.sort.factor50
 io.seqfile.sorter.recordlimit 50
 
 java.io.IOException: Spill failed
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
 MapTask.java:594)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
 MapTask.java:576)
   at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
   at Group.write(Group.java:68)
   at GroupPair.write(GroupPair.java:67)
   at
 org.apache.hadoop.io.serializer.WritableSerialization$Writable
Serializer.serialize(WritableSerialization.java:90)
   at
 org.apache.hadoop.io.serializer.WritableSerialization$Writable
Serializer.serialize(WritableSerialization.java:77)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTa
 sk.java:434)
   at MyMapper.map(MyMapper.java:27)
   at MyMapper.map(MyMapper.java:10)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
 Caused by: java.lang.StackOverflowError
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at Group.readFields(Group.java:62)
   at GroupPair.readFields(GroupPair.java:60)
   at
 org.apache.hadoop.io.WritableComparator.compare(WritableCompar
 ator.java:91)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTa
 sk.java:494)
   at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
   at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
   at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
 the above line repeated 200x
 
 I defined writeablecomparable called GroupPair which simply 
 holds to Group objects, each of which contains two integers. 
 I fail to see how QuickSort could recurse 200+ times since 
 that would require an insanely large amount of entries , far 
 more then the 500 million that had been output at that point. 
 
 How is this even possible? And what can be done to fix this?
 --
 View this message in context: 
 http://www.nabble.com/Stack-Overflow-When-Running-Job-tp175935
 94p17593594.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Stack Overflow When Running Job

RE: Stack Overflow When Running Job

2 matches

Site Navigation

Mail list logo

Footer information