[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-26 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586946#comment-13586946
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani], I think you reffred an old patch,
Please look at  MAPREDUCE-4974.4.patch

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-26 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: (was: MAPREDUCE-4974.1.patch)

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5029) Recursively take all files in the directories of a root directory

2013-02-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586994#comment-13586994
 ] 

Steve Loughran commented on MAPREDUCE-5029:
---

Please can you see if this problem still exists on Hadoop trunk. There have 
been changes to do iterative enumeration of directory contents rather than the 
ls * -see {{FileSystem.listFiles()}}. If this method isn't being used in MR 
jobs, maybe it could be.

 Recursively take all files in the directories of a root directory
 -

 Key: MAPREDUCE-5029
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5029
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2
Reporter: Abhilash S R

 Suppose we have a root directories with 1000's of sub directories and in each 
 directory there can be 100's of files.So while specifying the root directory 
 in the input path in map-reduce the program crashes due to sub directories in 
 the root directory.So if this feature is includes in latest version it will 
 be great helpful for programers.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5030) YARNClientImpl logging too aggressively

2013-02-26 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-5030:
---

 Summary: YARNClientImpl logging too aggressively
 Key: MAPREDUCE-5030
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5030
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial


Every time we execute bin/hadoop job etc, the following two lines show up: 

{noformat}
13/02/26 07:05:19 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/02/26 07:05:20 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.

{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5031) Maps hitting IndexOutOfBoundsException for higher values of mapreduce.task.io.sort.mb

2013-02-26 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-5031:
---

 Summary: Maps hitting IndexOutOfBoundsException for higher values 
of mapreduce.task.io.sort.mb
 Key: MAPREDUCE-5031
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5031
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.5, 2.0.3-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


While trying to reproduce MAPREDUCE-5028 on trunk, ran into what seems to be a 
different issue. To reproduce:

Psuedo-dist mode: mapreduce.{map,reduce}.memory.mb=2048, 
mapreduce.{map,reduce}.java.opts=-Xmx2048m, mapreduce.task.io.sort.mb=1280

The map tasks fail with the following error: 
{noformat}
Error: java.lang.IndexOutOfBoundsException at 
java.nio.Buffer.checkIndex(Buffer.java:512) at 
java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:113) at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1141) at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:686) at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:47) 
at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:36) 
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Affects Version/s: (was: 0.23.5)
   (was: 2.0.3-alpha)

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}
 Marked branch-0.23 and branch-2 also because the offending code seems to 
 exist there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Target Version/s: 1.2.0  (was: 1.2.0, 0.23.7, 2.0.4-beta)

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}
 Marked branch-0.23 and branch-2 also because the offending code seems to 
 exist there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Description: 
Verified the problem exists on branch-1 with the following configuration:

Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
io.sort.mb=1280, dfs.block.size=2147483648

Run teragen to generate 4 GB data
Maps fail when you run wordcount on this configuration with the following 
error: 
{noformat}
java.io.IOException: Spill failed
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
{noformat}

  was:
Verified the problem exists on branch-1 with the following configuration:

Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
io.sort.mb=1280, dfs.block.size=2147483648

Run teragen to generate 4 GB data
Maps fail when you run wordcount on this configuration with the following 
error: 
{noformat}
java.io.IOException: Spill failed
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
at 

[jira] [Updated] (MAPREDUCE-5031) Maps hitting IndexOutOfBoundsException for higher values of mapreduce.task.io.sort.mb

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5031:


Description: 
While trying to reproduce MAPREDUCE-5028 on trunk, ran into what seems to be a 
different issue. To reproduce:

Psuedo-dist mode: mapreduce.{map,reduce}.memory.mb=2048, 
mapreduce.{map,reduce}.java.opts=-Xmx2048m, mapreduce.task.io.sort.mb=1280

The map tasks fail with the following error: 
{noformat}
Error: java.lang.IndexOutOfBoundsException at 
java.nio.Buffer.checkIndex(Buffer.java:512) at
 java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:113) at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1141) at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:686) at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:47) at 
org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:36) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
 at 
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
{noformat}


  was:
While trying to reproduce MAPREDUCE-5028 on trunk, ran into what seems to be a 
different issue. To reproduce:

Psuedo-dist mode: mapreduce.{map,reduce}.memory.mb=2048, 
mapreduce.{map,reduce}.java.opts=-Xmx2048m, mapreduce.task.io.sort.mb=1280

The map tasks fail with the following error: 
{noformat}
Error: java.lang.IndexOutOfBoundsException at 
java.nio.Buffer.checkIndex(Buffer.java:512) at 
java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:113) at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1141) at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:686) at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:47) 
at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:36) 
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
{noformat}



 Maps hitting IndexOutOfBoundsException for higher values of 
 mapreduce.task.io.sort.mb
 -

 Key: MAPREDUCE-5031
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5031
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 While trying to reproduce MAPREDUCE-5028 on trunk, ran into what seems to be 
 a different issue. To reproduce:
 Psuedo-dist mode: mapreduce.{map,reduce}.memory.mb=2048, 
 mapreduce.{map,reduce}.java.opts=-Xmx2048m, mapreduce.task.io.sort.mb=1280
 The map tasks fail with the following error: 
 {noformat}
 Error: java.lang.IndexOutOfBoundsException at 
 java.nio.Buffer.checkIndex(Buffer.java:512) at
  java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:113) at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1141) 
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:686) 
 at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
  at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
  at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:47) 
 at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:36) 
 at 
 org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at 
 

[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Status: Patch Available  (was: Open)

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587298#comment-13587298
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570924/mr-5028-branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3360//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker reassigned MAPREDUCE-5027:


Assignee: Robert Parker

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker

 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Status: Patch Available  (was: Open)

netty seems to be more geared to limit connections per IP.
Extended the idea provided by Jason (thanks for the code snippet).

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.5, 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027.patch

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587422#comment-13587422
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571028/MAPREDUCE-5027.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle:

  org.apache.hadoop.mapred.TestShuffleHandler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3361//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3361//console

This message is automatically generated.

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5032) MapTask.MapOutputBuffer contains arithmetic overflows

2013-02-26 Thread Chris Douglas (JIRA)
Chris Douglas created MAPREDUCE-5032:


 Summary: MapTask.MapOutputBuffer contains arithmetic overflows
 Key: MAPREDUCE-5032
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5032
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.23.5, 2.0.3-alpha, 1.1.1
Reporter: Chris Douglas
Assignee: Chris Douglas


There are several places where offsets into the collection buffer can overflow 
when applied to large buffers. These should be accommodated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587448#comment-13587448
 ] 

Chris Douglas commented on MAPREDUCE-5028:
--

bq. getLength() returns the size of the entire buffer, and not just the 
remaining part of the buffer

Just to be clear, it's not the size of the backing array, but the index one 
greater than the last valid character in the input stream buffer. 
(ByteArrayInputStream) The change to {{DataInputBuffer}} implies the former, 
which is inaccurate.

The corrects several misuses of DataInputBuffer, which is great. There's 
another misuse at {{ReduceContextImpl.ValueIterator::next}} that could be 
included with these changes.

Most of this code doesn't check for overflow; it wasn't written for extremely 
large buffers. Just glancing at related code, MapTask.InMemValBytes contains 
code that could overflow and I'm sure there are others. Filed MAPREDUCE-5032.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated MAPREDUCE-4693:
-

Attachment: MAPREDUCE-4693.4.patch

 Historyserver should provide counters for failed tasks
 --

 Key: MAPREDUCE-4693
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Xuan Gong
  Labels: usability
 Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch, 
 MAPREDUCE-4693.3.patch, MAPREDUCE-4693.4.patch


 Currently the historyserver is not providing counters for failed tasks, even 
 though they are available via the AM as long as the job is still running.  
 Those counters are lost when the client needs to redirect to the 
 historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587509#comment-13587509
 ] 

Hadoop QA commented on MAPREDUCE-4693:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12571043/MAPREDUCE-4693.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-tools/hadoop-rumen.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3362//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3362//console

This message is automatically generated.

 Historyserver should provide counters for failed tasks
 --

 Key: MAPREDUCE-4693
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Xuan Gong
  Labels: usability
 Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch, 
 MAPREDUCE-4693.3.patch, MAPREDUCE-4693.4.patch


 Currently the historyserver is not providing counters for failed tasks, even 
 though they are available via the AM as long as the job is still running.  
 Those counters are lost when the client needs to redirect to the 
 historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Andrew Wang (JIRA)
Andrew Wang created MAPREDUCE-5033:
--

 Summary: mapred shell script should respect usage flags (--help 
-help -h)
 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor


Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated MAPREDUCE-5033:
---

Attachment: mapreduce-5033-1.patch

Little patch attached. Tested manually by running the mapred script.

 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated MAPREDUCE-5033:
---

Status: Patch Available  (was: Open)

 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated MAPREDUCE-4693:
-

Status: Open  (was: Patch Available)

 Historyserver should provide counters for failed tasks
 --

 Key: MAPREDUCE-4693
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Xuan Gong
  Labels: usability
 Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch, 
 MAPREDUCE-4693.3.patch, MAPREDUCE-4693.4.patch


 Currently the historyserver is not providing counters for failed tasks, even 
 though they are available via the AM as long as the job is still running.  
 Those counters are lost when the client needs to redirect to the 
 historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated MAPREDUCE-4693:
-

Status: Patch Available  (was: Open)

 Historyserver should provide counters for failed tasks
 --

 Key: MAPREDUCE-4693
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Xuan Gong
  Labels: usability
 Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch, 
 MAPREDUCE-4693.3.patch, MAPREDUCE-4693.4.patch


 Currently the historyserver is not providing counters for failed tasks, even 
 though they are available via the AM as long as the job is still running.  
 Those counters are lost when the client needs to redirect to the 
 historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027.patch

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch, MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587624#comment-13587624
 ] 

Hadoop QA commented on MAPREDUCE-5033:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12571051/mapreduce-5033-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3363//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3363//console

This message is automatically generated.

 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587654#comment-13587654
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571063/MAPREDUCE-5027.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle:

  org.apache.hadoop.mapred.TestShuffleHandler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3364//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3364//console

This message is automatically generated.

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch, MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5034) Class cast exception in MergeManagerImpl.java

2013-02-26 Thread Mariappan Asokan (JIRA)
Mariappan Asokan created MAPREDUCE-5034:
---

 Summary: Class cast exception in MergeManagerImpl.java
 Key: MAPREDUCE-5034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Mariappan Asokan


When reduce side merge spills to disk, the following exception was thrown:

org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
cannot be cast to java.lang.Comparable at 
java.util.TreeMap.put(TreeMap.java:542) at 
java.util.TreeSet.add(TreeSet.java:238) at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
 at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
 at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)

It looks like a bug introduced by MAPREDUCE-2264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-02-26 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587714#comment-13587714
 ] 

Mariappan Asokan commented on MAPREDUCE-2264:
-

The patch has introduced a class cast exception.  Please see MAPREDUCE-5034

 Job status exceeds 100% in some cases 
 --

 Key: MAPREDUCE-2264
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2, 0.20.205.0
Reporter: Adam Kramer
Assignee: Devaraj K
  Labels: critical-0.22.0
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
 MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
 MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
 MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
 MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
 MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk-4.patch, 
 MAPREDUCE-2264-trunk-5.patch, MAPREDUCE-2264-trunk-5.patch, 
 MAPREDUCE-2264-trunk-addendum.patch, MAPREDUCE-2264-trunk.patch, more than 
 100%.bmp


 I'm looking now at my jobtracker's list of running reduce tasks. One of them 
 is 120.05% complete, the other is 107.28% complete.
 I understand that these numbers are estimates, but there is no case in which 
 an estimate of 100% for a non-complete task is better than an estimate of 
 99.99%, nor is there any case in which an estimate greater than 100% is valid.
 I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5034) Class cast exception in MergeManagerImpl.java

2013-02-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587719#comment-13587719
 ] 

Sandy Ryza commented on MAPREDUCE-5034:
---

Which version is this found in?  There was an issue in MAPREDUCE-2264 that 
looks very similar to this that was already discovered.  Because of it the 
original patch was reverted and replaced with an update.

 Class cast exception in MergeManagerImpl.java
 -

 Key: MAPREDUCE-5034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Mariappan Asokan

 When reduce side merge spills to disk, the following exception was thrown:
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
 cannot be cast to java.lang.Comparable at 
 java.util.TreeMap.put(TreeMap.java:542) at 
 java.util.TreeSet.add(TreeSet.java:238) at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 It looks like a bug introduced by MAPREDUCE-2264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027.patch

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: (was: MAPREDUCE-5027.patch)

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: (was: MAPREDUCE-5027.patch)

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-02-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587725#comment-13587725
 ] 

Sandy Ryza commented on MAPREDUCE-2264:
---

This looks the same as the issue that was reported by Chris on January 27th - 
is it possible you're using the version of the patch that was reverted?  The 
way to check would be to see whether CompressAwarePath extends Path.

 Job status exceeds 100% in some cases 
 --

 Key: MAPREDUCE-2264
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2, 0.20.205.0
Reporter: Adam Kramer
Assignee: Devaraj K
  Labels: critical-0.22.0
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
 MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
 MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
 MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
 MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
 MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk-4.patch, 
 MAPREDUCE-2264-trunk-5.patch, MAPREDUCE-2264-trunk-5.patch, 
 MAPREDUCE-2264-trunk-addendum.patch, MAPREDUCE-2264-trunk.patch, more than 
 100%.bmp


 I'm looking now at my jobtracker's list of running reduce tasks. One of them 
 is 120.05% complete, the other is 107.28% complete.
 I understand that these numbers are estimates, but there is no case in which 
 an estimate of 100% for a non-complete task is better than an estimate of 
 99.99%, nor is there any case in which an estimate greater than 100% is valid.
 I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5034) Class cast exception in MergeManagerImpl.java

2013-02-26 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587727#comment-13587727
 ] 

Mariappan Asokan commented on MAPREDUCE-5034:
-

Hi Sandy,
  It could be a previous version of the patch that we picked up for testing.  I 
will confirm soon.

-- Asokan


 Class cast exception in MergeManagerImpl.java
 -

 Key: MAPREDUCE-5034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Mariappan Asokan

 When reduce side merge spills to disk, the following exception was thrown:
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
 cannot be cast to java.lang.Comparable at 
 java.util.TreeMap.put(TreeMap.java:542) at 
 java.util.TreeSet.add(TreeSet.java:238) at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 It looks like a bug introduced by MAPREDUCE-2264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5034) Class cast exception in MergeManagerImpl.java

2013-02-26 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587738#comment-13587738
 ] 

Mariappan Asokan commented on MAPREDUCE-5034:
-

Hi Sandy,
  You are right.  It is from a previous version of the patch.  Sorry, about the 
confusion.  I will reject this Jira.

-- Asokan

 Class cast exception in MergeManagerImpl.java
 -

 Key: MAPREDUCE-5034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Mariappan Asokan

 When reduce side merge spills to disk, the following exception was thrown:
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
 cannot be cast to java.lang.Comparable at 
 java.util.TreeMap.put(TreeMap.java:542) at 
 java.util.TreeSet.add(TreeSet.java:238) at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 It looks like a bug introduced by MAPREDUCE-2264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5034) Class cast exception in MergeManagerImpl.java

2013-02-26 Thread Mariappan Asokan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan resolved MAPREDUCE-5034.
-

Resolution: Not A Problem

 Class cast exception in MergeManagerImpl.java
 -

 Key: MAPREDUCE-5034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Mariappan Asokan

 When reduce side merge spills to disk, the following exception was thrown:
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$CompressAwarePath 
 cannot be cast to java.lang.Comparable at 
 java.util.TreeMap.put(TreeMap.java:542) at 
 java.util.TreeSet.add(TreeSet.java:238) at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.closeOnDiskFile(MergeManagerImpl.java:340)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:495)
  at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 It looks like a bug introduced by MAPREDUCE-2264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Attachment: mr-5028-branch1.patch

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587743#comment-13587743
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Thanks for your comments, Chris.

bq. Just to be clear, it's not the size of the backing array, but the index 
one greater than the last valid character in the input stream buffer. 
(ByteArrayInputStream) The change to DataInputBuffer implies the former, which 
is inaccurate.

Removed that comment, and added comments to {{DataInputBuffer#reset()}} to 
reflect {{ByteArrayInputStream}}.

bq. There's another misuse at ReduceContextImpl.ValueIterator::next that could 
be included with these changes.
If I am not mistaken, ReduceContextImpl is only in trunk versions. This patch 
is only for branch-1. Filed MAPREDUCE-5031 earlier to address the slightly 
different issues on trunk, can mark it as a duplicate to the one that you 
created. Have a half-baked patch, I can submit that there.



 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587744#comment-13587744
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

The -1 from Hadoop QA was because the uploaded branch is for patch 1. I have 
verified ant test-core passes.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027.patch

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: (was: MAPREDUCE-5027.patch)

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587821#comment-13587821
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571090/mr-5028-branch1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3365//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587836#comment-13587836
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571092/MAPREDUCE-5027.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3366//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3366//console

This message is automatically generated.

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4659) Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another

2013-02-26 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4659:
--

Attachment: MAPREDUCE-4659-branch-1.patch

 Confusing output when running hadoop version from one hadoop installation 
 when HADOOP_HOME points to another
 --

 Key: MAPREDUCE-4659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4659
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.2, 2.0.1-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4659-2.patch, MAPREDUCE-4659-3.patch, 
 MAPREDUCE-4659-4.patch, MAPREDUCE-4659-5.patch, 
 MAPREDUCE-4659-branch-1.patch, MAPREDUCE-4659.patch


 Hadoop version X is downloaded to ~/hadoop-x, and Hadoop version Y is 
 downloaded to ~/hadoop-y.  HADOOP_HOME is set to hadoop-x.  A user running 
 hadoop-y/bin/hadoop might expect to be running the hadoop-y jars, but, 
 because of HADOOP_HOME, will actually be running hadoop-x jars.
 hadoop version could help clear this up a little by reporting the current 
 HADOOP_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4659) Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another

2013-02-26 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4659:
--

Attachment: MAPREDUCE-4659-6.patch

 Confusing output when running hadoop version from one hadoop installation 
 when HADOOP_HOME points to another
 --

 Key: MAPREDUCE-4659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4659
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.2, 2.0.1-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4659-2.patch, MAPREDUCE-4659-3.patch, 
 MAPREDUCE-4659-4.patch, MAPREDUCE-4659-5.patch, MAPREDUCE-4659-6.patch, 
 MAPREDUCE-4659-branch-1.patch, MAPREDUCE-4659.patch


 Hadoop version X is downloaded to ~/hadoop-x, and Hadoop version Y is 
 downloaded to ~/hadoop-y.  HADOOP_HOME is set to hadoop-x.  A user running 
 hadoop-y/bin/hadoop might expect to be running the hadoop-y jars, but, 
 because of HADOOP_HOME, will actually be running hadoop-x jars.
 hadoop version could help clear this up a little by reporting the current 
 HADOOP_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4659) Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another

2013-02-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587927#comment-13587927
 ] 

Sandy Ryza commented on MAPREDUCE-4659:
---

Attached branch-1 patch and refresh for trunk

 Confusing output when running hadoop version from one hadoop installation 
 when HADOOP_HOME points to another
 --

 Key: MAPREDUCE-4659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4659
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.2, 2.0.1-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4659-2.patch, MAPREDUCE-4659-3.patch, 
 MAPREDUCE-4659-4.patch, MAPREDUCE-4659-5.patch, MAPREDUCE-4659-6.patch, 
 MAPREDUCE-4659-branch-1.patch, MAPREDUCE-4659.patch


 Hadoop version X is downloaded to ~/hadoop-x, and Hadoop version Y is 
 downloaded to ~/hadoop-y.  HADOOP_HOME is set to hadoop-x.  A user running 
 hadoop-y/bin/hadoop might expect to be running the hadoop-y jars, but, 
 because of HADOOP_HOME, will actually be running hadoop-x jars.
 hadoop version could help clear this up a little by reporting the current 
 HADOOP_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker updated MAPREDUCE-5027:
-

Attachment: MAPREDUCE-5027.patch

Added timeout parameter to test annotation.

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch, MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587931#comment-13587931
 ] 

Aaron T. Myers commented on MAPREDUCE-5033:
---

+1, patch looks good to me. I'm going to commit this momentarily.

 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated MAPREDUCE-5033:
--

  Resolution: Fixed
   Fix Version/s: 2.0.4-beta
Target Version/s: 2.0.4-beta
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Andrew.

 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: 2.0.4-beta

 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5027) Shuffle does not limit number of outstanding connections

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587940#comment-13587940
 ] 

Hadoop QA commented on MAPREDUCE-5027:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571124/MAPREDUCE-5027.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3367//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3367//console

This message is automatically generated.

 Shuffle does not limit number of outstanding connections
 

 Key: MAPREDUCE-5027
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5027
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Robert Parker
 Attachments: MAPREDUCE-5027.patch, MAPREDUCE-5027.patch


 The ShuffleHandler does not have any configurable limits to the number of 
 outstanding connections allowed.  Therefore a node with many map outputs and 
 many reducers in the cluster trying to fetch those outputs can exhaust a 
 nodemanager out of file descriptors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5033) mapred shell script should respect usage flags (--help -help -h)

2013-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587941#comment-13587941
 ] 

Hudson commented on MAPREDUCE-5033:
---

Integrated in Hadoop-trunk-Commit #3388 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3388/])
MAPREDUCE-5033. mapred shell script should respect usage flags (--help 
-help -h). Contributed by Andrew Wang. (Revision 1450584)

 Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1450584
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred


 mapred shell script should respect usage flags (--help -help -h)
 

 Key: MAPREDUCE-5033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5033
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: 2.0.4-beta

 Attachments: mapreduce-5033-1.patch


 Like in HADOOP-9267, the mapred shell script should respect the normal Unix-y 
 help flags.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4659) Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another

2013-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587976#comment-13587976
 ] 

Hadoop QA commented on MAPREDUCE-4659:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12571123/MAPREDUCE-4659-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3368//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3368//console

This message is automatically generated.

 Confusing output when running hadoop version from one hadoop installation 
 when HADOOP_HOME points to another
 --

 Key: MAPREDUCE-4659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4659
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.2, 2.0.1-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4659-2.patch, MAPREDUCE-4659-3.patch, 
 MAPREDUCE-4659-4.patch, MAPREDUCE-4659-5.patch, MAPREDUCE-4659-6.patch, 
 MAPREDUCE-4659-branch-1.patch, MAPREDUCE-4659.patch


 Hadoop version X is downloaded to ~/hadoop-x, and Hadoop version Y is 
 downloaded to ~/hadoop-y.  HADOOP_HOME is set to hadoop-x.  A user running 
 hadoop-y/bin/hadoop might expect to be running the hadoop-y jars, but, 
 because of HADOOP_HOME, will actually be running hadoop-x jars.
 hadoop version could help clear this up a little by reporting the current 
 HADOOP_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira