[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

2016-04-06 Thread Jason Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229567#comment-15229567
 ] 

Jason Wang commented on MAPREDUCE-6302:
---

was it backported?

> Preempt reducers after a configurable timeout irrespective of headroom
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> MAPREDUCE-6302.branch-2.6.0001.patch, MAPREDUCE-6302.branch-2.7.0001.patch, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, 
> mr-6302_branch-2.patch, queue_with_max163cores.png, 
> queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6647) MR usage counters use the resources requested instead of the resources allocated

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229447#comment-15229447
 ] 

Hudson commented on MAPREDUCE-6647:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9572 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9572/])
MAPREDUCE-6647. MR usage counters use the resources requested instead of 
(rkanter: rev 3be1ab485f557c8a3c6a5066453f24d8d61f30be)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java


> MR usage counters use the resources requested instead of the resources 
> allocated
> 
>
> Key: MAPREDUCE-6647
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6647
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0
>
> Attachments: mapreduce6647.001.patch, mapreduce6647.002.patch, 
> mapreduce6647.003.patch, mapreduce6647.004.patch
>
>
> As can be seen in the following snippet, the MR counters for usage use the 
> resources requested instead of the resources allocated. The scheduler 
> increment-allocation-mb configs could lead to these values not being the 
> same. We could change the counters to use the allocated resources in order to 
> account for this.
> {code}
>   private static void updateMillisCounters(JobCounterUpdateEvent jce,
>   TaskAttemptImpl taskAttempt) {
>  /***omitted**/
> long duration = (taskAttempt.getFinishTime() - 
> taskAttempt.getLaunchTime());
> int mbRequired =
> taskAttempt.getMemoryRequired(taskAttempt.conf, taskType);
> int vcoresRequired = taskAttempt.getCpuRequired(taskAttempt.conf, 
> taskType);
> int minSlotMemSize = taskAttempt.conf.getInt(
>   YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
>   YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
> int simSlotsRequired =
> minSlotMemSize == 0 ? 0 : (int) Math.ceil((float) mbRequired
> / minSlotMemSize);
> if (taskType == TaskType.MAP) {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_MAPS, simSlotsRequired * 
> duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_MAPS, duration * mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_MAPS, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_MAPS, duration);
> } else {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_REDUCES, simSlotsRequired 
> * duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_REDUCES, duration * 
> mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_REDUCES, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_REDUCES, duration);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6647) MR usage counters use the resources requested instead of the resources allocated

2016-04-06 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6647:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Thanks Haibo.  Committed to trunk!

> MR usage counters use the resources requested instead of the resources 
> allocated
> 
>
> Key: MAPREDUCE-6647
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6647
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0
>
> Attachments: mapreduce6647.001.patch, mapreduce6647.002.patch, 
> mapreduce6647.003.patch, mapreduce6647.004.patch
>
>
> As can be seen in the following snippet, the MR counters for usage use the 
> resources requested instead of the resources allocated. The scheduler 
> increment-allocation-mb configs could lead to these values not being the 
> same. We could change the counters to use the allocated resources in order to 
> account for this.
> {code}
>   private static void updateMillisCounters(JobCounterUpdateEvent jce,
>   TaskAttemptImpl taskAttempt) {
>  /***omitted**/
> long duration = (taskAttempt.getFinishTime() - 
> taskAttempt.getLaunchTime());
> int mbRequired =
> taskAttempt.getMemoryRequired(taskAttempt.conf, taskType);
> int vcoresRequired = taskAttempt.getCpuRequired(taskAttempt.conf, 
> taskType);
> int minSlotMemSize = taskAttempt.conf.getInt(
>   YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
>   YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
> int simSlotsRequired =
> minSlotMemSize == 0 ? 0 : (int) Math.ceil((float) mbRequired
> / minSlotMemSize);
> if (taskType == TaskType.MAP) {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_MAPS, simSlotsRequired * 
> duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_MAPS, duration * mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_MAPS, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_MAPS, duration);
> } else {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_REDUCES, simSlotsRequired 
> * duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_REDUCES, duration * 
> mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_REDUCES, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_REDUCES, duration);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6647) MR usage counters use the resources requested instead of the resources allocated

2016-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229412#comment-15229412
 ] 

Robert Kanter commented on MAPREDUCE-6647:
--

And branch-2!

> MR usage counters use the resources requested instead of the resources 
> allocated
> 
>
> Key: MAPREDUCE-6647
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6647
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0
>
> Attachments: mapreduce6647.001.patch, mapreduce6647.002.patch, 
> mapreduce6647.003.patch, mapreduce6647.004.patch
>
>
> As can be seen in the following snippet, the MR counters for usage use the 
> resources requested instead of the resources allocated. The scheduler 
> increment-allocation-mb configs could lead to these values not being the 
> same. We could change the counters to use the allocated resources in order to 
> account for this.
> {code}
>   private static void updateMillisCounters(JobCounterUpdateEvent jce,
>   TaskAttemptImpl taskAttempt) {
>  /***omitted**/
> long duration = (taskAttempt.getFinishTime() - 
> taskAttempt.getLaunchTime());
> int mbRequired =
> taskAttempt.getMemoryRequired(taskAttempt.conf, taskType);
> int vcoresRequired = taskAttempt.getCpuRequired(taskAttempt.conf, 
> taskType);
> int minSlotMemSize = taskAttempt.conf.getInt(
>   YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
>   YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
> int simSlotsRequired =
> minSlotMemSize == 0 ? 0 : (int) Math.ceil((float) mbRequired
> / minSlotMemSize);
> if (taskType == TaskType.MAP) {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_MAPS, simSlotsRequired * 
> duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_MAPS, duration * mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_MAPS, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_MAPS, duration);
> } else {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_REDUCES, simSlotsRequired 
> * duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_REDUCES, duration * 
> mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_REDUCES, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_REDUCES, duration);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229405#comment-15229405
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6513:


[~varun_saxena], let me know if you can update this soon enough for 2.7.3 in a 
couple of days. Otherwise, we can simply move this to 2.8 in few weeks.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6647) MR usage counters use the resources requested instead of the resources allocated

2016-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229357#comment-15229357
 ] 

Robert Kanter commented on MAPREDUCE-6647:
--

+1  Will commit shortly

> MR usage counters use the resources requested instead of the resources 
> allocated
> 
>
> Key: MAPREDUCE-6647
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6647
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6647.001.patch, mapreduce6647.002.patch, 
> mapreduce6647.003.patch, mapreduce6647.004.patch
>
>
> As can be seen in the following snippet, the MR counters for usage use the 
> resources requested instead of the resources allocated. The scheduler 
> increment-allocation-mb configs could lead to these values not being the 
> same. We could change the counters to use the allocated resources in order to 
> account for this.
> {code}
>   private static void updateMillisCounters(JobCounterUpdateEvent jce,
>   TaskAttemptImpl taskAttempt) {
>  /***omitted**/
> long duration = (taskAttempt.getFinishTime() - 
> taskAttempt.getLaunchTime());
> int mbRequired =
> taskAttempt.getMemoryRequired(taskAttempt.conf, taskType);
> int vcoresRequired = taskAttempt.getCpuRequired(taskAttempt.conf, 
> taskType);
> int minSlotMemSize = taskAttempt.conf.getInt(
>   YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
>   YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
> int simSlotsRequired =
> minSlotMemSize == 0 ? 0 : (int) Math.ceil((float) mbRequired
> / minSlotMemSize);
> if (taskType == TaskType.MAP) {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_MAPS, simSlotsRequired * 
> duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_MAPS, duration * mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_MAPS, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_MAPS, duration);
> } else {
>   jce.addCounterUpdate(JobCounter.SLOTS_MILLIS_REDUCES, simSlotsRequired 
> * duration);
>   jce.addCounterUpdate(JobCounter.MB_MILLIS_REDUCES, duration * 
> mbRequired);
>   jce.addCounterUpdate(JobCounter.VCORES_MILLIS_REDUCES, duration * 
> vcoresRequired);
>   jce.addCounterUpdate(JobCounter.MILLIS_REDUCES, duration);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2016-04-06 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229192#comment-15229192
 ] 

Haibo Chen commented on MAPREDUCE-5124:
---

[~jlowe], [~ozawa], [~revans2] Does increasing the task report interval sound 
like a viable approach to alleviate the issue here? Right now the report 
interval is hardcoded as 3 seconds. 
We could make the task report interval configurable and increase the interval 
with some heuristics to limit the number of task status updates per unit time.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6658) TestMRJobs fails

2016-04-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229136#comment-15229136
 ] 

Eric Badger commented on MAPREDUCE-6658:


Neither of these test failures are reproducible on my local machine with the 
patch applied. 

> TestMRJobs fails
> 
>
> Key: MAPREDUCE-6658
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6658
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Akira AJISAKA
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6658.001.patch
>
>
> TestMRJobs#testJobWithChangePriority fails.
> {noformat}
> Running org.apache.hadoop.mapreduce.v2.TestMRJobs
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 446.855 sec 
> <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
> testJobWithChangePriority(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 21.477 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobWithChangePriority(TestMRJobs.java:276)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6658) TestMRJobs fails

2016-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229108#comment-15229108
 ] 

Hadoop QA commented on MAPREDUCE-6658:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 46s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 51s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.TestContainerManagerSecurity |
|   | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.TestContainerManagerSecurity |
|   | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12797375/MAPREDUCE-6658.001.patch
 |
| JIRA Issue | MAPREDUCE-6658 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | 

[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-06 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229066#comment-15229066
 ] 

Rushabh S Shah commented on MAPREDUCE-6633:
---

Ran the failed junit failure on bith jdk7 and jdk8.
Both of them passed fine on my machine.
{noformat}
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.54 sec <<< 
FAILURE! - in org.apache.hadoop.mapreduce.tools.TestCLI
testGetJob(org.apache.hadoop.mapreduce.tools.TestCLI)  Time elapsed: 0.084 sec  
<<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.tools.TestCLI.testGetJob(TestCLI.java:181)
{noformat}

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-04-06 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229032#comment-15229032
 ] 

Rushabh S Shah commented on MAPREDUCE-6633:
---

bq.  If there is a runtime exception on the reducer (memory error, NPE, etc.), 
maps would be re-run unnecessarily. 
In this case the decompressor threw RuntimeException 
(ArrayIndexOutOfBondsException is a subclass).
If we had re run the map on another node, the job would have succeeded.

bq. I am a little nervous about re-fetching for any exception.
I understand your concern but I think its a good change according to me.

> AM should retry map attempts if the reduce task encounters commpression 
> related errors.
> ---
>
> Key: MAPREDUCE-6633
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
> Attachments: MAPREDUCE-6633.patch
>
>
> When reduce task encounters compression related errors, AM  doesn't retry the 
> corresponding map task.
> In one of the case we encountered, here is the stack trace.
> {noformat}
> 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#29
>   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>   at 
> com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
>   at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
>   at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> {noformat}
> In this case, the node on which the map task ran had a bad drive.
> If the AM had retried running that map task somewhere else, the job 
> definitely would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6658) TestMRJobs fails

2016-04-06 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated MAPREDUCE-6658:
---
Attachment: MAPREDUCE-6658.001.patch

[~ajisakaa], I tracked down the issue. Indeed, it was my code that broke the 
tests. I'm not sure why I wasn't able to reproduce the failures consistently, 
but maybe that was another side effect of my earlier changes. The bug was that 
the MiniYARNCluster was explicitly calling transitionToActive on the resource 
manager with index 0. This overwrote some conf properties and reset the cluster 
max priority property back to DEFAULT. 

I'm attaching a patch that fixes this issue by only transition the RM with 
index 0 to active if we are in an HA cluster. 

> TestMRJobs fails
> 
>
> Key: MAPREDUCE-6658
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6658
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Akira AJISAKA
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6658.001.patch
>
>
> TestMRJobs#testJobWithChangePriority fails.
> {noformat}
> Running org.apache.hadoop.mapreduce.v2.TestMRJobs
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 446.855 sec 
> <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
> testJobWithChangePriority(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 21.477 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobWithChangePriority(TestMRJobs.java:276)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6658) TestMRJobs fails

2016-04-06 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated MAPREDUCE-6658:
---
Status: Patch Available  (was: Open)

> TestMRJobs fails
> 
>
> Key: MAPREDUCE-6658
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6658
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Akira AJISAKA
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6658.001.patch
>
>
> TestMRJobs#testJobWithChangePriority fails.
> {noformat}
> Running org.apache.hadoop.mapreduce.v2.TestMRJobs
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 446.855 sec 
> <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
> testJobWithChangePriority(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 21.477 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobWithChangePriority(TestMRJobs.java:276)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6670) TestJobListCache#testEviction sometimes fails on Windows with timeout

2016-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228521#comment-15228521
 ] 

Hudson commented on MAPREDUCE-6670:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9567 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9567/])
MAPREDUCE-6670. TestJobListCache#testEviction sometimes fails on Windows 
(junping_du: rev de96d7c88a42cd54bd88ce2de63122998e967efa)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobListCache.java


> TestJobListCache#testEviction sometimes fails on Windows with timeout
> -
>
> Key: MAPREDUCE-6670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6670
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6670.001.patch, MAPREDUCE-6670.002.patch
>
>
> TestJobListCache#testEviction often needs more than 1000 ms to finish in 
> Windows environment. Increasing the timeout solves the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6649) getFailureInfo not returning any failure info

2016-04-06 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228518#comment-15228518
 ] 

Eric Badger commented on MAPREDUCE-6649:


[~eepayne], can you take a look at this patch and commit if it looks good?

> getFailureInfo not returning any failure info
> -
>
> Key: MAPREDUCE-6649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6649.001.patch, MAPREDUCE-6649.002.patch
>
>
> The following command does not produce any failure info as to why the job 
> failed. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dmapreduce.jobtracker.split.metainfo.maxsize=10 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1 -rt 1
> {noformat}
> {noformat}
> 2016-03-07 10:34:58,112 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0004 failed with 
> state FAILED due to: 
> {noformat}
> To contrast, here is a command and associated command line output to show a 
> failed job that gives the correct failiure info. 
> {noformat}
> $HADOOP_PREFIX/bin/hadoop jar 
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
>  sleep -Dyarn.app.mapreduce.am.command-opts=-goober 
> -Dmapreduce.job.queuename=default -m 20 -r 0 -mt 3
> {noformat}
> {noformat}
> 2016-03-07 10:30:13,103 INFO  [main] mapreduce.Job 
> (Job.java:monitorAndPrintJob(1431)) - Job job_1457364518683_0003 failed with 
> state FAILED due to: Application application_1457364518683_0003 failed 3 
> times due to AM Container for appattempt_1457364518683_0003_03 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1457364518683_0003_03_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>   at org.apache.hadoop.util.Shell.run(Shell.java:838)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6670) TestJobListCache#testEviction sometimes fails on Windows with timeout

2016-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6670:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I have commit the patch to trunk and branch-2. Thanks [~GergelyNovak] for 
delivering the patch and congratulation for the first patch contribution to 
Apache Hadoop!

> TestJobListCache#testEviction sometimes fails on Windows with timeout
> -
>
> Key: MAPREDUCE-6670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6670
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6670.001.patch, MAPREDUCE-6670.002.patch
>
>
> TestJobListCache#testEviction often needs more than 1000 ms to finish in 
> Windows environment. Increasing the timeout solves the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6670) TestJobListCache#testEviction sometimes fails on Windows with timeout

2016-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228346#comment-15228346
 ] 

Junping Du commented on MAPREDUCE-6670:
---

+1. Will commit it shortly.

> TestJobListCache#testEviction sometimes fails on Windows with timeout
> -
>
> Key: MAPREDUCE-6670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6670
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Attachments: MAPREDUCE-6670.001.patch, MAPREDUCE-6670.002.patch
>
>
> TestJobListCache#testEviction often needs more than 1000 ms to finish in 
> Windows environment. Increasing the timeout solves the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5937) hadoop/mapred job -history shows the counters twice in the output.

2016-04-06 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228050#comment-15228050
 ] 

Akira AJISAKA commented on MAPREDUCE-5937:
--

bq. So do you think that instead of using what the current patch is doing, we 
should change what the actual AbstractCounter#getGroupNames returns?
Yes. That way we can avoid duplicate entries for JSONHistoryViewerPrinter as 
well. The current patch ignores all of the deprecated counter name.

> hadoop/mapred job -history  shows the counters twice in the 
> output.
> -
>
> Key: MAPREDUCE-5937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5937
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.7.2
>Reporter: Jinghui Wang
>Assignee: Andres Perez
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5937-branch-2.7-02.patch, 
> MAPREDUCE-5937-branch-2.7.2.002.patch, MAPREDUCE-5937.patch, 
> job_history_cli_sample.out
>
>
> HiistoryView#printCounters method uses AbstractCounter#getGroupNames, which 
> includes legacy groups can cause duplicates on CLI output.
> See attached example output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)