[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks

2014-04-17 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-4711:


Assignee: (was: Ravi Prakash)

 Append time elapsed since job-start-time for finished tasks
 ---

 Key: MAPREDUCE-4711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.3
Reporter: Ravi Prakash
 Attachments: MAPREDUCE-4711.branch-0.23.patch


 In 0.20.x/1.x, the analyze job link gave this information
 bq. The last Map task task_sometask finished at (relative to the Job launch 
 time): 5/10 20:23:10 (1hrs, 27mins, 54sec)
 The time it took for the last task to finish needs to be calculated mentally 
 in 0.23. I believe we should print it next to the finish time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks

2014-04-17 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-4711:


Target Version/s: 3.0.0, 2.5.0  (was: 3.0.0, 0.23.11)

 Append time elapsed since job-start-time for finished tasks
 ---

 Key: MAPREDUCE-4711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.3
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-4711.branch-0.23.patch


 In 0.20.x/1.x, the analyze job link gave this information
 bq. The last Map task task_sometask finished at (relative to the Job launch 
 time): 5/10 20:23:10 (1hrs, 27mins, 54sec)
 The time it took for the last task to finish needs to be calculated mentally 
 in 0.23. I believe we should print it next to the finish time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-3191) docs for map output compression incorrectly reference SequenceFile

2014-04-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972839#comment-13972839
 ] 

Hudson commented on MAPREDUCE-3191:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #543 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/543/])
MAPREDUCE-3191. docs for map output compression incorrectly reference 
SequenceFile (Chen He via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588009)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java


 docs for map output compression incorrectly reference SequenceFile
 --

 Key: MAPREDUCE-3191
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3191
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Chen He
Priority: Trivial
  Labels: documentation, noob
 Fix For: 3.0.0, 0.23.11, 2.5.0, 2.4.1

 Attachments: MAPREDUCE-3191-v2.patch, MAPREDUCE-3191.patch


 The documentation currently says that map output compression uses 
 SequenceFile compression. This hasn't been true in several years, since we 
 use IFile for intermediate data now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-3286) Unit tests for MAPREDUCE-3186 - User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed.

2014-04-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-3286:
--

Target Version/s: trunk  (was: 0.23.0)

 Unit tests for MAPREDUCE-3186 - User jobs are getting hanged if the Resource 
 manager process goes down and comes up while job is getting executed.
 --

 Key: MAPREDUCE-3286
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3286
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Affects Versions: 0.23.0
 Environment: linux
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: test

 If the resource manager is restarted while the job execution is in progress, 
 the job is getting hanged.
 UI shows the job as running.
 In the RM log, it is throwing an error ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 AppAttemptId doesnt exist in cache appattempt_1318579738195_0004_01
 In the console MRAppMaster and Runjar processes are not getting killed



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-3191) docs for map output compression incorrectly reference SequenceFile

2014-04-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972972#comment-13972972
 ] 

Hudson commented on MAPREDUCE-3191:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1735 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1735/])
MAPREDUCE-3191. docs for map output compression incorrectly reference 
SequenceFile (Chen He via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588009)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java


 docs for map output compression incorrectly reference SequenceFile
 --

 Key: MAPREDUCE-3191
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3191
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Chen He
Priority: Trivial
  Labels: documentation, noob
 Fix For: 3.0.0, 0.23.11, 2.5.0, 2.4.1

 Attachments: MAPREDUCE-3191-v2.patch, MAPREDUCE-3191.patch


 The documentation currently says that map output compression uses 
 SequenceFile compression. This hasn't been true in several years, since we 
 use IFile for intermediate data now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-3191) docs for map output compression incorrectly reference SequenceFile

2014-04-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973010#comment-13973010
 ] 

Hudson commented on MAPREDUCE-3191:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1760 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1760/])
MAPREDUCE-3191. docs for map output compression incorrectly reference 
SequenceFile (Chen He via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588009)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java


 docs for map output compression incorrectly reference SequenceFile
 --

 Key: MAPREDUCE-3191
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3191
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Chen He
Priority: Trivial
  Labels: documentation, noob
 Fix For: 3.0.0, 0.23.11, 2.5.0, 2.4.1

 Attachments: MAPREDUCE-3191-v2.patch, MAPREDUCE-3191.patch


 The documentation currently says that map output compression uses 
 SequenceFile compression. This hasn't been true in several years, since we 
 use IFile for intermediate data now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-17 Thread David Rosenstrauch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973102#comment-13973102
 ] 

David Rosenstrauch commented on MAPREDUCE-5402:
---

My apologies for not following up on this.  Solving this issue is starting to 
become a bigger priority for me now, so I'd like to pick it up again.

I'd like to try to test out Tsuyoshi's patch.  Could anyone provide some 
assistance/pointers on a couple of things?

1) I've never built hadoop from source.  Could anyone assist in pointing me to 
the proper location of the source trees I'd need to build the distcp.jar?

2) Would anyone have any tips on how I might be able to backport this new 
version of distcp to mrv1, similar to what Harsh did in building 
hadoop-distcp-mr1-2.0.0-mr1-cdh4.0.1-harsh.jar in 
https://issues.cloudera.org/browse/DISTRO-420 ?  Most of my Hadoop clusters are 
still running mrv1.

TIA!  Any pointers much appreciated!

 DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
 --

 Key: MAPREDUCE-5402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
 MAPREDUCE-5402.3.patch


 In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
 describes the implementation of DynamicInputFormat, with one of the main 
 motivations cited being to reduce the chance of long-tails where a few 
 leftover mappers run much longer than the rest.
 However, I today ran into a situation where I experienced exactly such a long 
 tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
 the problem by overriding the number of mappers and the split ratio used by 
 the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
 set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
 This constant is actually set quite low for production use.  (See a 
 description of my use case below.)  And although MAPREDUCE-2765 states that 
 this is an overridable maximum, when reading through the code there does 
 not actually appear to be any mechanism available to override it.
 This should be changed.  It should be possible to expand the maximum # of 
 chunks beyond this arbitrary limit.
 For example, here is the situation I ran into today:
 I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
 The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
 the number of mappers for the job from the default of 20 to 128, so as to 
 more properly parallelize the copy across the cluster.  The number of chunk 
 files created was calculated as 241, and mapred.num.entries.per.chunk was 
 calculated as 12.
 As the job ran on, it reached a point where there were only 4 remaining map 
 tasks, which had each been running for over 2 hours.  The reason for this was 
 that each of the 12 files that those mappers were copying were quite large 
 (several hundred megabytes in size) and took ~20 minutes each.  However, 
 during this time, all the other 124 mappers sat idle.
 In theory I should be able to alleviate this problem with DynamicInputFormat. 
  If I were able to, say, quadruple the number of chunk files created, that 
 would have made each chunk contain only 3 files, and these large files would 
 have gotten distributed better around the cluster and copied in parallel.
 However, when I tried to do that - by overriding mapred.listing.split.ratio 
 to, say, 10 - DynamicInputFormat responded with an exception (Too many 
 chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
 split-ratio to proceed.) - presumably because I exceeded the 
 MAX_CHUNKS_TOLERABLE value of 400.
 Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
 can't personally see any.
 If this limit has no particular logic behind it, then it should be 
 overridable - or even better:  removed altogether.  After all, I'm not sure I 
 see any need for it.  Even if numMaps * splitRatio resulted in an 
 extraordinarily large number, if the code were modified so that the number of 
 chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
 there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
 where the product of numMaps and splitRatio is large, capping the number of 
 chunks at the number of files (numberOfChunks = numberOfFiles) would result 
 in 1 file per chunk - the maximum parallelization possible.  That may not be 
 the best-tuned solution for some users, but I would think that it should be 
 left up to the user to deal 

[jira] [Updated] (MAPREDUCE-5841) uber job doesn't terminate on getting mapred job kill

2014-04-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated MAPREDUCE-5841:
---

Description: 
If you issue a mapred job -kill against a uberized job, the job (and the yarn 
application) state transitions to KILLED, but the application master process 
continues to run. The job actually runs to completion despite the killed status.

This can be easily reproduced by running a sleep job:

{noformat}
hadoop jar hadoop-mapreduce-client-jobclient-2.3.0-tests.jar sleep -m 1 -r 0 
-mt 30
{noformat}

Issue a kill with mapred job -kill \[job-id\]. The UI will show the job (app) 
is in the KILLED state. However, you can see the application master is still 
running.

  was:
If you issue a mapred job -kill against a uberized job, the job (and the yarn 
application) state transitions to KILLED, but the application master process 
continues to run. The job actually runs to completion.

This can be easily reproduced by running a sleep job:

{noformat}
hadoop jar hadoop-mapreduce-client-jobclient-2.3.0-tests.jar sleep -m 1 -r 0 
-mt 30
{noformat}

Issue a kill with mapred job -kill \[job-id\]. The UI will show the job (app) 
is in the KILLED state. However, you can see the application master is still 
running.


 uber job doesn't terminate on getting mapred job kill
 -

 Key: MAPREDUCE-5841
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5841
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 If you issue a mapred job -kill against a uberized job, the job (and the 
 yarn application) state transitions to KILLED, but the application master 
 process continues to run. The job actually runs to completion despite the 
 killed status.
 This can be easily reproduced by running a sleep job:
 {noformat}
 hadoop jar hadoop-mapreduce-client-jobclient-2.3.0-tests.jar sleep -m 1 -r 0 
 -mt 30
 {noformat}
 Issue a kill with mapred job -kill \[job-id\]. The UI will show the job 
 (app) is in the KILLED state. However, you can see the application master is 
 still running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973138#comment-13973138
 ] 

Hadoop QA commented on MAPREDUCE-5402:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593727/MAPREDUCE-5402.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4530//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4530//console

This message is automatically generated.

 DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
 --

 Key: MAPREDUCE-5402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
 MAPREDUCE-5402.3.patch


 In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
 describes the implementation of DynamicInputFormat, with one of the main 
 motivations cited being to reduce the chance of long-tails where a few 
 leftover mappers run much longer than the rest.
 However, I today ran into a situation where I experienced exactly such a long 
 tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
 the problem by overriding the number of mappers and the split ratio used by 
 the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
 set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
 This constant is actually set quite low for production use.  (See a 
 description of my use case below.)  And although MAPREDUCE-2765 states that 
 this is an overridable maximum, when reading through the code there does 
 not actually appear to be any mechanism available to override it.
 This should be changed.  It should be possible to expand the maximum # of 
 chunks beyond this arbitrary limit.
 For example, here is the situation I ran into today:
 I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
 The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
 the number of mappers for the job from the default of 20 to 128, so as to 
 more properly parallelize the copy across the cluster.  The number of chunk 
 files created was calculated as 241, and mapred.num.entries.per.chunk was 
 calculated as 12.
 As the job ran on, it reached a point where there were only 4 remaining map 
 tasks, which had each been running for over 2 hours.  The reason for this was 
 that each of the 12 files that those mappers were copying were quite large 
 (several hundred megabytes in size) and took ~20 minutes each.  However, 
 during this time, all the other 124 mappers sat idle.
 In theory I should be able to alleviate this problem with DynamicInputFormat. 
  If I were able to, say, quadruple the number of chunk files created, that 
 would have made each chunk contain only 3 files, and these large files would 
 have gotten distributed better around the cluster and copied in parallel.
 However, when I tried to do that - by overriding mapred.listing.split.ratio 
 to, say, 10 - DynamicInputFormat responded with an exception (Too many 
 chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
 split-ratio to proceed.) - presumably because I exceeded the 
 MAX_CHUNKS_TOLERABLE value of 400.
 Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
 can't personally see any.
 If this limit has no particular logic behind it, then it should be 
 

[jira] [Created] (MAPREDUCE-5842) uber job with LinuxContainerExecutor doesn't work

2014-04-17 Thread Atkins (JIRA)
Atkins created MAPREDUCE-5842:
-

 Summary: uber job with LinuxContainerExecutor doesn't work
 Key: MAPREDUCE-5842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Atkins


enable ubertask with linux container executer cause exception:
{noformat}
2014-04-17 23:26:07,859 DEBUG [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: LocalFetcher 1 going to 
fetch: attempt_1397748070416_0001_m_06_0
2014-04-17 23:26:07,860 WARN [uber-SubtaskRunner] 
org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
(uberized) 'child' : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in localfetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:62)
at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:57)
at 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:123)
at 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:101)
at 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:84)
Caused by: java.lang.RuntimeException: Secure IO is not possible without native 
code extensions.
at org.apache.hadoop.io.SecureIOUtils.clinit(SecureIOUtils.java:75)
... 6 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-17 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973143#comment-13973143
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5402:
---

Hi David, 

To build Hadoop from trunk, protobuf 2.5.0 is required. You can get it from 
here: https://code.google.com/p/protobuf/downloads/list

Maybe following command works well to build hadoop:
{code}
$ git clone git://git.apache.org/hadoop-common.git
$ cd hadoop-common
$ wget 
https://issues.apache.org/jira/secure/attachment/12593727/MAPREDUCE-5402.3.patch
$ patch -p1  MAPREDUCE-5402.3.patch
$ mvn install -DskipTests
$ find . -name *distcp*jar
./hadoop-tools/hadoop-distcp/target/hadoop-distcp-3.0.0-SNAPSHOT-sources.jar
./hadoop-tools/hadoop-distcp/target/hadoop-distcp-3.0.0-SNAPSHOT.jar
{code}

Following links can help you understand building Hadoop:
http://wiki.apache.org/hadoop/HowToContribute
http://wiki.apache.org/hadoop/GitAndHadoop
https://github.com/apache/hadoop-common/blob/trunk/BUILDING.txt

Please let me know if you have any questions. Thanks.

 DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
 --

 Key: MAPREDUCE-5402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
 MAPREDUCE-5402.3.patch


 In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
 describes the implementation of DynamicInputFormat, with one of the main 
 motivations cited being to reduce the chance of long-tails where a few 
 leftover mappers run much longer than the rest.
 However, I today ran into a situation where I experienced exactly such a long 
 tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
 the problem by overriding the number of mappers and the split ratio used by 
 the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
 set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
 This constant is actually set quite low for production use.  (See a 
 description of my use case below.)  And although MAPREDUCE-2765 states that 
 this is an overridable maximum, when reading through the code there does 
 not actually appear to be any mechanism available to override it.
 This should be changed.  It should be possible to expand the maximum # of 
 chunks beyond this arbitrary limit.
 For example, here is the situation I ran into today:
 I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
 The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
 the number of mappers for the job from the default of 20 to 128, so as to 
 more properly parallelize the copy across the cluster.  The number of chunk 
 files created was calculated as 241, and mapred.num.entries.per.chunk was 
 calculated as 12.
 As the job ran on, it reached a point where there were only 4 remaining map 
 tasks, which had each been running for over 2 hours.  The reason for this was 
 that each of the 12 files that those mappers were copying were quite large 
 (several hundred megabytes in size) and took ~20 minutes each.  However, 
 during this time, all the other 124 mappers sat idle.
 In theory I should be able to alleviate this problem with DynamicInputFormat. 
  If I were able to, say, quadruple the number of chunk files created, that 
 would have made each chunk contain only 3 files, and these large files would 
 have gotten distributed better around the cluster and copied in parallel.
 However, when I tried to do that - by overriding mapred.listing.split.ratio 
 to, say, 10 - DynamicInputFormat responded with an exception (Too many 
 chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
 split-ratio to proceed.) - presumably because I exceeded the 
 MAX_CHUNKS_TOLERABLE value of 400.
 Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
 can't personally see any.
 If this limit has no particular logic behind it, then it should be 
 overridable - or even better:  removed altogether.  After all, I'm not sure I 
 see any need for it.  Even if numMaps * splitRatio resulted in an 
 extraordinarily large number, if the code were modified so that the number of 
 chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
 there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
 where the product of numMaps and splitRatio is large, capping the number of 
 chunks at the number of files (numberOfChunks = numberOfFiles) would result 
 in 1 file per chunk - the maximum parallelization possible.  That may not be 
 

[jira] [Commented] (MAPREDUCE-4931) Add user-APIs for classpath precedence control

2014-04-17 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973197#comment-13973197
 ] 

Chen He commented on MAPREDUCE-4931:


Hi [~qwertymaniac]
Does this JIRA still an issue for 2.x? If so, could you retarget it to 2.x?

 Add user-APIs for classpath precedence control
 --

 Key: MAPREDUCE-4931
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4931
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 1.0.0
Reporter: Harsh J
Priority: Minor

 The feature config from MAPREDUCE-1938 of allowing tasks to start with 
 user-classes-first is fairly popular and can use its own API hooks in 
 Job/JobConf classes, making it easier to discover and use it rather than 
 continuing to keep it as an advanced param.
 I propose to add two APIs to Job/JobConf:
 {code}
 void setUserClassesTakesPrecedence(boolean)
 boolean userClassesTakesPrecedence()
 {code}
 Both of which, depending on their branch of commit, set the property 
 {{mapreduce.user.classpath.first}} (1.x) or 
 {{mapreduce.job.user.classpath.first}} (trunk, 2.x and if needed, in 0.23.x).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-3089) Augment TestRMContainerAllocator to verify MAPREDUCE-2646

2014-04-17 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973208#comment-13973208
 ] 

Chen He commented on MAPREDUCE-3089:


Hi [~acmurthy]
Since both MAPREDUCE-3078 and MAPREDUCE-2646 are all resolved. Is this JIRA 
still an issue in 2.x? If so, could you retarget it to 2.x?

 Augment TestRMContainerAllocator to verify MAPREDUCE-2646
 -

 Key: MAPREDUCE-3089
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3089
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Arun C Murthy
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.24.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4718) MapReduce fails If I pass a parameter as a S3 folder

2014-04-17 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973223#comment-13973223
 ] 

Chen He commented on MAPREDUCE-4718:


Hi [~benkimkimben]
This JIRA has no updates since 11/Oct/12. Is it still a problem? Right now, it 
is time to clean up 0.23 JIRAs. If it is still a problem in 2.x. Please 
retarget it to 2.x. Thanks!

 MapReduce fails If I pass a parameter as a S3 folder
 

 Key: MAPREDUCE-4718
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 1.0.0, 1.0.3
 Environment: Hadoop with default configurations
Reporter: Benjamin Kim

 I'm running a wordcount MR as follows
 hadoop jar WordCount.jar wordcount.WordCountDriver 
 s3n://bucket/wordcount/input s3n://bucket/wordcount/output
  
 s3n://bucket/wordcount/input is a s3 object that contains other input files.
 However I get following NPE error
 12/10/02 18:56:23 INFO mapred.JobClient:  map 0% reduce 0%
 12/10/02 18:56:54 INFO mapred.JobClient:  map 50% reduce 0%
 12/10/02 18:56:56 INFO mapred.JobClient: Task Id : 
 attempt_201210021853_0001_m_01_0, Status : FAILED
 java.lang.NullPointerException
 at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
 at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
 at java.io.FilterInputStream.close(FilterInputStream.java:155)
 at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
 at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 MR runs fine if i specify more specific input path such as 
 s3n://bucket/wordcount/input/file.txt
 MR fails if I pass s3 folder as a parameter
 In summary,
 This works
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
 This doesn't work
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
 (both input path are directories)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (MAPREDUCE-5843) TestMRKeyValueTextInputFormat failing on Windows

2014-04-17 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev moved YARN-1956 to MAPREDUCE-5843:


Key: MAPREDUCE-5843  (was: YARN-1956)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

 TestMRKeyValueTextInputFormat failing on Windows
 

 Key: MAPREDUCE-5843
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5843
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1956.0.patch


 TestMRKeyValueInputFormat fails intermittently on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5843) TestMRKeyValueTextInputFormat failing on Windows

2014-04-17 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated MAPREDUCE-5843:
-

Attachment: apache-mapreduce-5843.1.patch

 TestMRKeyValueTextInputFormat failing on Windows
 

 Key: MAPREDUCE-5843
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5843
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5843.1.patch, apache-yarn-1956.0.patch


 TestMRKeyValueInputFormat fails intermittently on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5843) TestMRKeyValueTextInputFormat failing on Windows

2014-04-17 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated MAPREDUCE-5843:
-

Status: Patch Available  (was: Open)

 TestMRKeyValueTextInputFormat failing on Windows
 

 Key: MAPREDUCE-5843
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5843
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5843.1.patch, apache-yarn-1956.0.patch


 TestMRKeyValueInputFormat fails intermittently on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-04-17 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-1955 to MAPREDUCE-5844:
-

Key: MAPREDUCE-5844  (was: YARN-1955)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-04-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973318#comment-13973318
 ] 

Jason Lowe commented on MAPREDUCE-5844:
---

Moved this to MAPREDUCE since the decision to preempt reducers for mappers is 
ultimately an MR AM decision and not a YARN decision.

A headroom of zero should mean there is literally no more room in the queue, 
and I would expect the job would need to take action in those cases to make 
progress in light of fetch failures.  (e.g.: think of a scenario where all the 
other jobs taking up resources are long-running and won't release resources 
anytime soon)

If you are seeing cases where reducers are shot then immediately relaunched 
along with the failed maps then that implies that either the headroom 
calculation is wrong or resources happened to be freed right at the time the 
new containers were requested.  Note that there are a number of issues with 
headroom calculations, see YARN-1198 and related JIRAs.

Assuming those are fixed, there might be some usefulness to a grace period 
where we wait for other apps to free up resources in the queue to avoid 
shooting reducers.  A proper value for that probably depends upon how much work 
would be lost by the reducers in question, how long we can tolerate waiting to 
try to preserve that work, and how likely it is that another app will free up 
resources anytime soon.  If we wait and still don't get our resources then 
that's purely worse than a job that took decisive action as soon as a map 
retroactively failed and there's no more space left in the queue.   Also if the 
headroom is zero because a single job has hit user limits within the queue then 
waiting serves no purpose -- it has to shoot a reducer in that case to make 
progress.  In that latter case we'd need additional information in the allocate 
response from the scheduler to know that waiting for resources to be released 
from other applications in the queue isn't going to work.

It would be good to verify from the RM logs what is happening in your case.  If 
the headroom calculation is wrong then we should fix that, otherwise if 
resources are churning quickly then a grace period before preempting reducers 
may make sense.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5843) TestMRKeyValueTextInputFormat failing on Windows

2014-04-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973404#comment-13973404
 ] 

Hadoop QA commented on MAPREDUCE-5843:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12640682/apache-mapreduce-5843.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4531//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4531//console

This message is automatically generated.

 TestMRKeyValueTextInputFormat failing on Windows
 

 Key: MAPREDUCE-5843
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5843
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5843.1.patch, apache-yarn-1956.0.patch


 TestMRKeyValueInputFormat fails intermittently on Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2014-04-17 Thread Varun Vasudev (JIRA)
Varun Vasudev created MAPREDUCE-5845:


 Summary: TestShuffleHandler failing intermittently on windows
 Key: MAPREDUCE-5845
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev


TestShuffleHandler fails intermittently on Windows - specifically, 
testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2014-04-17 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated MAPREDUCE-5845:
-

Attachment: apache-mapreduce-5845.0.patch

 TestShuffleHandler failing intermittently on windows
 

 Key: MAPREDUCE-5845
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5845.0.patch


 TestShuffleHandler fails intermittently on Windows - specifically, 
 testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2014-04-17 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated MAPREDUCE-5845:
-

Status: Patch Available  (was: Open)

 TestShuffleHandler failing intermittently on windows
 

 Key: MAPREDUCE-5845
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5845.0.patch


 TestShuffleHandler fails intermittently on Windows - specifically, 
 testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-04-17 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973430#comment-13973430
 ] 

Maysam Yabandeh commented on MAPREDUCE-5844:


Thanks [~jlowe] for your detailed comment.

# As I explained in the description of the jira the printed headroom in the 
logs is always zero. e.g.,
{code}
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for 
application_x: ask=8 release= 0 newContainers=0 finishedContainers=0 
resourcelimit=memory:0, vCores:0 knownNMs=x
{code}
And this is not because there is no headroom (I know that by checking the 
available resources when job was running).
# I actually was not surprised by headroom set always to zero since I found the 
the headroom field being abandoned in the source code of fairscheduler: in 
SchedulerApplicationAttempt#getHeadroom() is the one with which the headroom 
field in the response is set, while SchedulerApplicationAttempt#setHeadroom() 
is never invoked in FairScheduler (it is invoked in capacity and fifo scheduler 
though)
# I assumed that not invoking setHeadroom in fair scheduler was intentional 
perhaps due to complications of computing the headroom when fair share is taken 
into account. But based on your comment, I understand that this could be a 
forgotten case rather than abandoned one.
# At least in the observed case that we suffered from this problem, the 
headroom was available and both the preempted reducer and the mapper were 
assigned immediately (less than a few seconds). So, delaying the preemption 
even for a period as short as 1 minute could prevent this problem, while not 
having a tangible negative impact in cases that the preemption was actually 
required. I agree that there are tradeoffs with the this preemption delay 
(specially when it is high) but even a short value will suffice to cover this 
special case that the headroom is already available.
# Weather we will have a fix for headroom calculation in fairschedualr or not, 
it seems to me that allowing the user to configure the preemption to be 
postponed for a short delay would not be hurtful, if it is not beneficial.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5812) Make task context available to OutputCommitter.isRecoverySupported()

2014-04-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973434#comment-13973434
 ] 

Ashutosh Chauhan commented on MAPREDUCE-5812:
-

Whats the status for this issue? It will be good if this is available in 2.5?

  Make task context available to OutputCommitter.isRecoverySupported()
 -

 Key: MAPREDUCE-5812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5812
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.3.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5812.1.patch


 Background
 ==
 The system like Hive provides its version of  OutputCommitter. The custom 
 implementation of isRecoverySupported() requires task context. From 
 taskContext:getConfiguration(), hive checks if  hive-defined specific 
 property is set or not. Based on the property value, it returns true or 
 false. However, in the current OutputCommitter:isRecoverySupported(), there 
 is no way of getting task config. As a result, user can't  turn on/off the 
 MRAM recovery feature.
 Proposed resolution:
 ===
 1. Pass Task Context into  isRecoverySupported() method.
 Pros: Easy and clean
 Cons: Possible backward compatibility issue due to aPI changes. (Is it true?)
 2. Call outputCommitter.setupTask(taskContext) from MRAM: The new 
 OutputCommitter will store the context in the class level variable and use it 
 from  isRecoverySupported() 
 Props: No API changes. No backward compatibility issue. This call can be made 
 from MRAppMaster.getOutputCommitter() method for old API case.
 Cons: Might not be very clean solution due to class level variable.
 Please give your comments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-17 Thread David Rosenstrauch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973443#comment-13973443
 ] 

David Rosenstrauch commented on MAPREDUCE-5402:
---

Thanks much for the pointers Tsuyoshi.  Will give this a try tonight.

Also, just wondering:  any thoughts on if/how it might be possible to backport 
this to mrv1?

 DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
 --

 Key: MAPREDUCE-5402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
 MAPREDUCE-5402.3.patch


 In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
 describes the implementation of DynamicInputFormat, with one of the main 
 motivations cited being to reduce the chance of long-tails where a few 
 leftover mappers run much longer than the rest.
 However, I today ran into a situation where I experienced exactly such a long 
 tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
 the problem by overriding the number of mappers and the split ratio used by 
 the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
 set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
 This constant is actually set quite low for production use.  (See a 
 description of my use case below.)  And although MAPREDUCE-2765 states that 
 this is an overridable maximum, when reading through the code there does 
 not actually appear to be any mechanism available to override it.
 This should be changed.  It should be possible to expand the maximum # of 
 chunks beyond this arbitrary limit.
 For example, here is the situation I ran into today:
 I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
 The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
 the number of mappers for the job from the default of 20 to 128, so as to 
 more properly parallelize the copy across the cluster.  The number of chunk 
 files created was calculated as 241, and mapred.num.entries.per.chunk was 
 calculated as 12.
 As the job ran on, it reached a point where there were only 4 remaining map 
 tasks, which had each been running for over 2 hours.  The reason for this was 
 that each of the 12 files that those mappers were copying were quite large 
 (several hundred megabytes in size) and took ~20 minutes each.  However, 
 during this time, all the other 124 mappers sat idle.
 In theory I should be able to alleviate this problem with DynamicInputFormat. 
  If I were able to, say, quadruple the number of chunk files created, that 
 would have made each chunk contain only 3 files, and these large files would 
 have gotten distributed better around the cluster and copied in parallel.
 However, when I tried to do that - by overriding mapred.listing.split.ratio 
 to, say, 10 - DynamicInputFormat responded with an exception (Too many 
 chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
 split-ratio to proceed.) - presumably because I exceeded the 
 MAX_CHUNKS_TOLERABLE value of 400.
 Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
 can't personally see any.
 If this limit has no particular logic behind it, then it should be 
 overridable - or even better:  removed altogether.  After all, I'm not sure I 
 see any need for it.  Even if numMaps * splitRatio resulted in an 
 extraordinarily large number, if the code were modified so that the number of 
 chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
 there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
 where the product of numMaps and splitRatio is large, capping the number of 
 chunks at the number of files (numberOfChunks = numberOfFiles) would result 
 in 1 file per chunk - the maximum parallelization possible.  That may not be 
 the best-tuned solution for some users, but I would think that it should be 
 left up to the user to deal with the potential consequence of not having 
 tuned their job properly.  Certainly that would be better than having an 
 arbitrary hard-coded limit that *prevents* proper parallelization when 
 dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5846) Rumen doesn't understand JobQueueChangedEvent

2014-04-17 Thread Nathan Roberts (JIRA)
Nathan Roberts created MAPREDUCE-5846:
-

 Summary: Rumen doesn't understand JobQueueChangedEvent
 Key: MAPREDUCE-5846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5846
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: trunk, 2.4.0
Reporter: Nathan Roberts


MAPREDUCE:5732 introduced a JobQueueChangeEvent to jhist files. Rumen fails to 
parse jhist files containing this event. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5846) Rumen doesn't understand JobQueueChangedEvent

2014-04-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated MAPREDUCE-5846:
--

Attachment: MAPREDUCE-5846.patch

 Rumen doesn't understand JobQueueChangedEvent
 -

 Key: MAPREDUCE-5846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5846
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: trunk, 2.4.0
Reporter: Nathan Roberts
 Attachments: MAPREDUCE-5846.patch


 MAPREDUCE:5732 introduced a JobQueueChangeEvent to jhist files. Rumen fails 
 to parse jhist files containing this event. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-5846) Rumen doesn't understand JobQueueChangedEvent

2014-04-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts reassigned MAPREDUCE-5846:
-

Assignee: Nathan Roberts

 Rumen doesn't understand JobQueueChangedEvent
 -

 Key: MAPREDUCE-5846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5846
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: trunk, 2.4.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: MAPREDUCE-5846.patch


 MAPREDUCE:5732 introduced a JobQueueChangeEvent to jhist files. Rumen fails 
 to parse jhist files containing this event. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2014-04-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973482#comment-13973482
 ] 

Hadoop QA commented on MAPREDUCE-5845:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12640707/apache-mapreduce-5845.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4532//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4532//console

This message is automatically generated.

 TestShuffleHandler failing intermittently on windows
 

 Key: MAPREDUCE-5845
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-mapreduce-5845.0.patch


 TestShuffleHandler fails intermittently on Windows - specifically, 
 testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-04-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973526#comment-13973526
 ] 

Jason Lowe commented on MAPREDUCE-5844:
---

Ah, I didn't realize this was the FairScheduler.  Yeah, if headroom is always 
zero that's going to wreak havoc as soon as any fetch failure occurs when 
reducers have been launched.  The priority should be to fix the headroom 
calculation in the scheduler first.  I suspect once that's done then for most 
cases there won't be such a need for grace period support in the AMs.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-04-17 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973572#comment-13973572
 ] 

Maysam Yabandeh commented on MAPREDUCE-5844:


Thanks [~jlowe]

[~sandyr], [~tucu00], could you please comment on the plan for setting headroom 
in the fair scheduler's responses to apps? Or perhaps I am misreading the code 
and it is already there but not working! Should I open a jira for that?

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh

 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-17 Thread David Rosenstrauch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973796#comment-13973796
 ] 

David Rosenstrauch commented on MAPREDUCE-5402:
---

Never mind - I think I've got it.  Your MAPREDUCE-5402.3.patch file can patch 
directly against the code in Harsh's backport git repo.  (Using patch -p3.)

Built the code, and starting to test now ...

 DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
 --

 Key: MAPREDUCE-5402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
 MAPREDUCE-5402.3.patch


 In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
 describes the implementation of DynamicInputFormat, with one of the main 
 motivations cited being to reduce the chance of long-tails where a few 
 leftover mappers run much longer than the rest.
 However, I today ran into a situation where I experienced exactly such a long 
 tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
 the problem by overriding the number of mappers and the split ratio used by 
 the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
 set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
 This constant is actually set quite low for production use.  (See a 
 description of my use case below.)  And although MAPREDUCE-2765 states that 
 this is an overridable maximum, when reading through the code there does 
 not actually appear to be any mechanism available to override it.
 This should be changed.  It should be possible to expand the maximum # of 
 chunks beyond this arbitrary limit.
 For example, here is the situation I ran into today:
 I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
 The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
 the number of mappers for the job from the default of 20 to 128, so as to 
 more properly parallelize the copy across the cluster.  The number of chunk 
 files created was calculated as 241, and mapred.num.entries.per.chunk was 
 calculated as 12.
 As the job ran on, it reached a point where there were only 4 remaining map 
 tasks, which had each been running for over 2 hours.  The reason for this was 
 that each of the 12 files that those mappers were copying were quite large 
 (several hundred megabytes in size) and took ~20 minutes each.  However, 
 during this time, all the other 124 mappers sat idle.
 In theory I should be able to alleviate this problem with DynamicInputFormat. 
  If I were able to, say, quadruple the number of chunk files created, that 
 would have made each chunk contain only 3 files, and these large files would 
 have gotten distributed better around the cluster and copied in parallel.
 However, when I tried to do that - by overriding mapred.listing.split.ratio 
 to, say, 10 - DynamicInputFormat responded with an exception (Too many 
 chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
 split-ratio to proceed.) - presumably because I exceeded the 
 MAX_CHUNKS_TOLERABLE value of 400.
 Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
 can't personally see any.
 If this limit has no particular logic behind it, then it should be 
 overridable - or even better:  removed altogether.  After all, I'm not sure I 
 see any need for it.  Even if numMaps * splitRatio resulted in an 
 extraordinarily large number, if the code were modified so that the number of 
 chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
 there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
 where the product of numMaps and splitRatio is large, capping the number of 
 chunks at the number of files (numberOfChunks = numberOfFiles) would result 
 in 1 file per chunk - the maximum parallelization possible.  That may not be 
 the best-tuned solution for some users, but I would think that it should be 
 left up to the user to deal with the potential consequence of not having 
 tuned their job properly.  Certainly that would be better than having an 
 arbitrary hard-coded limit that *prevents* proper parallelization when 
 dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)