[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930226#comment-13930226
 ] 

Hudson commented on MAPREDUCE-5028:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/506/])
MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a 
large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576170)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930349#comment-13930349
 ] 

Hudson commented on MAPREDUCE-5028:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/])
MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a 
large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576170)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930420#comment-13930420
 ] 

Hudson commented on MAPREDUCE-5028:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/])
MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a 
large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576170)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926272#comment-13926272
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Updated patch adds a unit tests and incorporates most of Vinod's suggestions. 

bq. There seems to be one more related bug w.r.t usage of 
DatInputBuffer.reset(). 
One of my earlier patches (the one that got reverted last year) was needlessly 
fixing this related bug.  

bq. Can you also cross verify Task.ValuesIterator.readNextKey() and 
readNextValue()?
Looks like you are right, but we need to be very careful and run several 
workloads before including them though (got bit by this before). If you are 
okay with it, I would like to address them in MAPREDUCE-5032, so we can get 
this piece in first as it has been tested by multiple people. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926363#comment-13926363
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5028:


The latest patch looks good to me. +1. The test is failing without the patch 
and passes with.

Jenkins is having issues. Trying again manually. Will commit this once Jenkins 
says okay..

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929831#comment-13929831
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633783/mr-5028-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929857#comment-13929857
 ] 

Hudson commented on MAPREDUCE-5028:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5302/])
MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a 
large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576170)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925105#comment-13925105
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5028:


We have a lot of MR jobs running as part of our test-suite. Aren't these 
similar to those (clearly I didn't run the large-sorter myself)? Why would they 
take 10 mins?

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925119#comment-13925119
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

bq. Why would they take 10 mins?
Running LargeSorter for io.sort.mb = 1536 takes  4 minutes on a relatively 
beefy machine. Running two more jobs, I am assuming would bring the total to 
around 10 mins. On not so beefy machines, it could take longer. 
 
If you insist on adding a unit test, I can try putting together one. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924624#comment-13924624
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

[~vinodkv] offered to look at this patch. Thanks Vinod. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924694#comment-13924694
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

It has been fixed in branch-1, since 1.2.0.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924705#comment-13924705
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5028:


Looked at the patch. Tx for working on it, this one is a year in the making. 
Let's do the last push!

So the main bug fix is the int overflow stuff in MapTask.java, right?

Not caused by your patch, but minor suggestions for InMemoryReader.java
 - start and length can and should be marked final.
 - memDataIn can be private and final

Can we fix the javadoc of the getLengh() API? Ideally I'd also rename the API, 
but will leave it up to you. I don't have a problem reviewing a patch with that 
rename. Anyways, back to javadoc for DataInputBuffer.getLength() and 
DataInputBuffer.Buffer.getLength(), we can say what [~chris.douglas] said above 
 - (from ByteArrayInputStream) returns the index one greater than the last 
valid character in the input stream buffer.

There seems to be one more related bug w.r.t usage of DatInputBuffer.reset(). 
Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()?

Why not run this LargeSorter with large values for io.sort.mb as a unit test? 
I'm afraid that we may unknowingly add more such issues in future if we don't 
automatically run it as part of our test-suite. The problem will be playing 
with surefire's heap sizes, but that's about it. 

We should hop onto MAPREDUCE-5032 next as I am not sure if we are fixing 
everything here.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924745#comment-13924745
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Testing for 3 different values of io.sort.mb - different sizes - will probably 
take 10 minutes altogether. I agree the unit tests help with making sure we 
don't introduce regressions, but adding 10 minutes to each run of precommit 
build etc. seems like a huge overhead. What do you think of adding a module for 
integration/system/end-to-end tests that get run nightly, but not for 
precommit-builds etc.? That seems like a better approach in the long run - we 
can include stress tests for YARN state-store, HA etc. as well. If you think 
that would be more reasonable, I propose we do it in follow-up tasks. 

I can add unit tests along the lines of 
[repro-mr-5028.patch|https://issues.apache.org/jira/secure/attachment/12577110/repro-mr-5028.patch]
 and address other concerns with the patch.  

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922048#comment-13922048
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633012/mr-5028-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1532 javac 
compiler warnings (more than the trunk's current 1531 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian 

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-01-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872421#comment-13872421
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Hey Carlo - sorry I have been distracted with the HA work. 

Per Arun's suggestion, I was trying to write a unit test that doesn't take too 
long but tests the fix. I spent half-a-day, but couldn't come up with anything 
tangible. I might be able to spend more time in a couple of weeks. Please feel 
free to pick this up - a combination of MR-5028_testapp.patch (Arun's patch) 
and repro-mr-5028.patch could be good starting points for tests on this. 

I deployed the fix on a cluster internally, ran it for a while, and haven't 
noticed any issues with the latest patch. Gopal seems to have validated through 
extensive logging and debugging as well.


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-01-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872445#comment-13872445
 ] 

Carlo Curino commented on MAPREDUCE-5028:
-

I am swamped till the end of the month for sure, so you probably will get to it 
before me... If it is still open when at that point I can help you by running 
this at scale and confirm whether your fix is working or not. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-01-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871700#comment-13871700
 ] 

Carlo Curino commented on MAPREDUCE-5028:
-

Hey Karthik, any news on this? 

I think we might have stumbled upon the same issue while running gridmix with 
some large parameters for io.sort.mb. 
This was with a very recent version of trunk (errors showing up in LoadJob 
during cleanup() an regular map() calls).

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-06-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684446#comment-13684446
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Thanks Gopal for verifying the fix. I did write some tests using Arun's patch, 
but have been running into issues where the test crashes and doesn't report 
failure. Taking a closer look, will provide an update as soon as I have one.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.1.0-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-17 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660748#comment-13660748
 ] 

Gopal V commented on MAPREDUCE-5028:


I've reviewed this and the latest patch (please number the uploads for easy 
reference) works correctly  prevents the over-flow.

+1 on the patch.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659666#comment-13659666
 ] 

Gopal V commented on MAPREDUCE-5028:


[~kkambatl], I took some time to go through your patch.

Patch contains 2 different fixes, which deserve their own tests  commits. 

Good catch on the overflow with the kvindex,kvend variables. That is a bug with 
the mapper with large buffers. That is a good  clean fix.

But for the second issue, I found out it triggered when the inline Combiner is 
run when there are  3 spills in the SpillThread. This wasn't tested in 
[~acmurthy]'s test-app (but the word-count sum combiner does trigger it 
cleanly).

And there I found your fix to be suspect. So, for the sake of data I logged 
every call to reset and crawled a 13 gb log file to find out offenders in reset 
(i.e where (long)start + (long)length  input.length).

This particular back-trace stood out as a key offender. I found that to be 
significant instead of merely locating the overflow cases.

{code}
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator.getKey(MapTask.java:1784)
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:138)
{code}

I will take a closer look at that code, it might be cleaner to tackle the issue 
at the first-cause location.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659951#comment-13659951
 ] 

Gopal V commented on MAPREDUCE-5028:


I ran the tests again because something didn't seem right - my '+' operation 
was turning into a string concat operation in logging (*ugh*).

{code}
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 
687161440, length = 687161444
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: count math 687161440 + 687161444 = 
1374322884
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
2013-05-15 18:52:47,876 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:237)

2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 
905211353, length = 905211357
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: count math 905211353 + 905211357 = 
1810422710
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
2013-05-15 18:52:47,861 INFO [SpillThread] 
org.apache.hadoop.io.DataInputBuffer: 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
{code}

Those are wrong, definitely wrong.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
  

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658753#comment-13658753
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Thanks Arun. Great idea to have these tests and the test-app patch looks mostly 
good - the default num_reduces should probably be 1. Let me try adding some 
tests using this.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658804#comment-13658804
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Realized the default num_reduces can be zero, was under the impression we 
wanted to test shuffle as well with this. Now, makes sense. Thanks.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-14 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657137#comment-13657137
 ] 

Arun C Murthy commented on MAPREDUCE-5028:
--

[~kkambatl] Sorry! I've been lax on this one. I'm on a flight now, which should 
help - looking right now.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657441#comment-13657441
 ] 

Gopal V commented on MAPREDUCE-5028:


The integer overflow in kvindex/kvstart/kvend seems to be a pretty direct 
problem for large buffers (i.e + buffer.length overflows).

Taking a closer look at the .reset() changes separately, tomorrow.


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657505#comment-13657505
 ] 

Gopal V commented on MAPREDUCE-5028:


HADOOP_CLASSPATH=./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/hadoop-mapreduce-client-jobclient-2.0.5-SNAPSHOT-tests.jar
 hadoop  org.apache.hadoop.mapreduce.LargeSorter -D io.sort.mb=1280  -D 
mapreduce.map.memory.mb=3072 -D mapred.map.child.java.opts=-server  -Xmx2248m 
-Djava.net.preferIPv4Stack=true /tmp/q.$RANDOM -r 2 

ran with extra logging

2013-05-14 20:52:52,911 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2013-05-14 20:52:56,794 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: 
Finished spill 0
2013-05-14 20:52:56,795 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart, 
kvend = int(-134521988) or long(268131196)



 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, MR-5028_testapp.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-05-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656425#comment-13656425
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Hey [~acmurthy]: just checking if you had a chance to take a look at this.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-04-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629261#comment-13629261
 ] 

Arun C Murthy commented on MAPREDUCE-5028:
--

Sorry, been busy - thanks for the update. Looking now. Tx.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-04-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629563#comment-13629563
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577110/repro-mr-5028.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.mapred.TestMapOutputBuffer
  org.apache.hadoop.mapreduce.task.reduce.TestInMemoryReader

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-04-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629574#comment-13629574
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

The test failures are a result of the new tests added in repro-mr-5028.patch 
specifically to reproduce the issue. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-04-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625597#comment-13625597
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Hey [~acmurthy], did you get a chance to take a look at the patch. Please let 
me know if I can provide any more information. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607322#comment-13607322
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574490/mr-5028-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607339#comment-13607339
 ] 

Arun C Murthy commented on MAPREDUCE-5028:
--

I'm currently away for Hadoop Summit, can I please get a few more days to 
review this? Tx.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607494#comment-13607494
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Yarn-trunk #161 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/161/])
Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458433
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607561#comment-13607561
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Hdfs-0.23-Build #559 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/559/])
svn merge -r 1458380:1458376 Reverted MAPREDUCE-5028 from branch-0.23 
(Revision 1458469)
MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via 
tgraves) (Revision 1458380)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458469
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java

tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458380
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607568#comment-13607568
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Hdfs-trunk #1350 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/])
Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458433
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607624#comment-13607624
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Mapreduce-trunk #1378 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/])
Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458433
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606121#comment-13606121
 ] 

Chris Nauroth commented on MAPREDUCE-5028:
--

I'm seeing some new test failures since this change went to trunk.  One example 
is {{TestMRJobs#testDistributedCache}}.  The MR job fails with this in the 
syslog:

{code}
2013-03-18 23:39:25,065 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.io.IOException: Negative value-length not 
allowed: -6 for 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes@1b1f1f12
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:243)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1597)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:694)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
{code}


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606218#comment-13606218
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Yarn-trunk #160 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/160/])
MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl 
via tucu) (Revision 1457918)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1457918
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606306#comment-13606306
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Hdfs-trunk #1349 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1349/])
MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl 
via tucu) (Revision 1457918)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1457918
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606321#comment-13606321
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-Mapreduce-trunk #1377 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1377/])
MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl 
via tucu) (Revision 1457918)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1457918
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606562#comment-13606562
 ] 

Chris Nauroth commented on MAPREDUCE-5028:
--

Thanks, Karthik.  I think you'll need to look into the syslog files output from 
the map tasks in the target/test name sub-directory instead of the test run's 
stdout.  The exception I mentioned was in the syslog files when I ran it.

I'm guessing that we have a subtle error in the offset calculations when 
{{InMemoryReader}} resets buffers, but I haven't pinned it down specifically 
yet.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606579#comment-13606579
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Thanks Chris for reporting this. I now see the errors you mention, in my second 
run. Sorry for the failures, working on identifying the root cause and a fix. 
Looks like test-patch didn't catch these early because the tests are in 
client-jobclient and the changes were in client-core.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606604#comment-13606604
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

I believe I found the issue, TestMRJobs passes locally for me now. However, I 
am running all MR tests locally for sanity before uploading the patch.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606605#comment-13606605
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5028:
---

Reverted commits from trunk and branch-2.

We need to fix test-patch, not the first time this is happening, as it is 
trying to be smarter than it is by running tests only in modules that changed, 
the correct thing should be to run tests in all dependent modules. Though a 
change in commmon would trigger ALL tests. May be a compromise would be to run 
all the tests in the corresponding component (common, hdfs, yarn, mapreduce, 
tools).


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606608#comment-13606608
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-trunk-Commit #3491 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3491/])
Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458433
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606812#comment-13606812
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574399/mr-5028-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was 

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607305#comment-13607305
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Uploaded a patch with one additional change. More details below:
# The trunk patch from March 4 worked for large values of {{io.sort.mb}}, but 
has caused a bunch of test failures.
# The trunk patch from March 19 fixes the test failures by removing an 
adjustment to a call to {{DataInputBuffer#reset()}} that wasn't supposed to be.
# However, the above mentioned adjustment was partly responsible for the 
original patch working for large values of {{io.sort.mb}}. Removing that has 
fixed test failures, but the jobs with large values of {{io.sort.mb}} started 
failing.
# The fix seems to be an adjustment to {{DataInputBuffer#reset()}} in 
{{ReduceContextImpl}}. A complementary change to {{ReduceContext}} exists in 
branch-1 patch. While working on the original patch, I did make this adjustment 
but later removed it on noticing negative offsets in jobs (the same behavior as 
in the test failures). The mess-up seems to be in my picking the wrong 
adjustment to undo. The latest patch (19 Mar 22:26) fixes this mistake.

The following steps validate the patch:
# Ran *all* tests under hadoop-mapreduce-project 
# Ran (pi 4 1000), (teragen 4GB data - 4 mappers), (wordcount on teragen 
output) on a pseudo-dist cluster with dfs.block.size=64MB, java.opts=1GB, 
io.sort.mb=256MB
# Ran (pi 4 1000), (teragen 4GB data - 2 mappers), (wordcount on teragen 
output) on a pseudo-dist cluster with dfs.block.size=2GB, java.opts=2.5GB, 
io.sort.mb=1280MB

Please suggest any additional testing that you think is required.

Thanks again Chris for immediately notifying us of the issue. Sorry Alejandro 
and Bobby for the additional trouble.


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605486#comment-13605486
 ] 

Hudson commented on MAPREDUCE-5028:
---

Integrated in Hadoop-trunk-Commit #3487 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3487/])
MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl 
via tucu) (Revision 1457918)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1457918
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 1.2.0, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603458#comment-13603458
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

[~chris.douglas], will you be able to take a look at the latest patch? Your 
valuable insights from MAPREDUCE-64 experience will surely help.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594174#comment-13594174
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5028:
---

Thanks Karthik. I've committed the patch for branch-1. Thanks Chris for 
reviewing it.

+1 for the trunk patch. I'll wait for a bit to see if there are comments for 
others before committing it.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592854#comment-13592854
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571997/mr-5028-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-28 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589413#comment-13589413
 ] 

Chris Douglas commented on MAPREDUCE-5028:
--

I'd rather leave {{DataInputBuffer}} alone. But if the current doc really 
bothers you, please use formatting consistent with the rest of the project.

+1 on the changes to 1.x, but changing the scope of the issue to fix all 
misuses of {{DataInputBuffer::reset}} in 2.x and trunk would be preferred. This 
issue doesn't cover all errors for large sort buffers, and separate issues for 
each branch are avoidable. But I won't block a bug fix.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588656#comment-13588656
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

[~chris.douglas], do you think the newly added comments are fine and the patch 
good to go?

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587298#comment-13587298
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570924/mr-5028-branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3360//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587448#comment-13587448
 ] 

Chris Douglas commented on MAPREDUCE-5028:
--

bq. getLength() returns the size of the entire buffer, and not just the 
remaining part of the buffer

Just to be clear, it's not the size of the backing array, but the index one 
greater than the last valid character in the input stream buffer. 
(ByteArrayInputStream) The change to {{DataInputBuffer}} implies the former, 
which is inaccurate.

The corrects several misuses of DataInputBuffer, which is great. There's 
another misuse at {{ReduceContextImpl.ValueIterator::next}} that could be 
included with these changes.

Most of this code doesn't check for overflow; it wasn't written for extremely 
large buffers. Just glancing at related code, MapTask.InMemValBytes contains 
code that could overflow and I'm sure there are others. Filed MAPREDUCE-5032.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587743#comment-13587743
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Thanks for your comments, Chris.

bq. Just to be clear, it's not the size of the backing array, but the index 
one greater than the last valid character in the input stream buffer. 
(ByteArrayInputStream) The change to DataInputBuffer implies the former, which 
is inaccurate.

Removed that comment, and added comments to {{DataInputBuffer#reset()}} to 
reflect {{ByteArrayInputStream}}.

bq. There's another misuse at ReduceContextImpl.ValueIterator::next that could 
be included with these changes.
If I am not mistaken, ReduceContextImpl is only in trunk versions. This patch 
is only for branch-1. Filed MAPREDUCE-5031 earlier to address the slightly 
different issues on trunk, can mark it as a duplicate to the one that you 
created. Have a half-baked patch, I can submit that there.



 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587744#comment-13587744
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

The -1 from Hadoop QA was because the uploaded branch is for patch 1. I have 
verified ant test-core passes.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587821#comment-13587821
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571090/mr-5028-branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3365//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-25 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586708#comment-13586708
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

http://comments.gmane.org/gmane.comp.java.hadoop.mapreduce.user/2485 looks like 
the same issue.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical

 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}
 Marked branch-0.23 and branch-2 also because the offending code seems to 
 exist there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira