[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941211#comment-14941211 ] Rohith Sharma K S commented on MAPREDUCE-6485: -- Committing shortly.. > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6488) Make buffer size in PipeMapRed configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Tianyi resolved MAPREDUCE-6488. -- Resolution: Invalid > Make buffer size in PipeMapRed configurable > --- > > Key: MAPREDUCE-6488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6488 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: He Tianyi >Assignee: He Tianyi > > Default value of buffer size is 128K in {{PipeMapRed}}. > When mapper input record is large enough that it won't fit in buffer, > {{MapRunner}} blocks until written. If child process and input reader are > both slow (due to calculation and decompress), then process of decoding and > reading will rarely overlap with each other, hurting performance. > I suppose we should make the buffer size configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6302) Incorrect headroom can lead to a deadlock between map and reduce allocations
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941135#comment-14941135 ] Jason Lowe commented on MAPREDUCE-6302: --- Since the old code also doesn't preempt if there's room for one map then I'm OK with the current logic. I just didn't want a regression. And as for SHUFFLE phase awareness, I agree that's best left for a followup JIRA. > Incorrect headroom can lead to a deadlock between map and reduce allocations > - > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-prelim.patch, queue_with_max163cores.png, queue_with_max263cores.png, > queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated MAPREDUCE-6485: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) committed to branch-2/trunk.. Thanks [~xinxianyin] for contributions!! [~kasha] for the additional review.. > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941241#comment-14941241 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-trunk-Commit #8554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8554/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941295#comment-14941295 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #479 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/479/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/CHANGES.txt > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941433#comment-14941433 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #471 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/471/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941511#comment-14941511 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2414 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2414/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941455#comment-14941455 ] Hudson commented on MAPREDUCE-6485: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #1209 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1209/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6491) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/MAPREDUCE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941585#comment-14941585 ] Dustin Cote commented on MAPREDUCE-6491: [~jlowe], yes I'll check it out now. I was building against trunk and it looked clean there. Let me see how it goes with branch-2. > Environment variable handling assumes values should be appended > --- > > Key: MAPREDUCE-6491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6491 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Dustin Cote > Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, > YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch, YARN-2369-7.patch, > YARN-2369-8.patch, YARN-2369-9.patch > > > When processing environment variables for a container context the code > assumes that the value should be appended to any pre-existing value in the > environment. This may be desired behavior for handling path-like environment > variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a > non-intuitive and harmful way to handle any variable that does not have > path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941598#comment-14941598 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #445 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/445/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941676#comment-14941676 ] Hadoop QA commented on MAPREDUCE-6451: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 26s | The applied patch generated 3 new checkstyle issues (total was 36, now 28). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 6m 32s | Tests passed in hadoop-distcp. | | | | 44m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764830/MAPREDUCE-6451-v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1037ee5 | | Release Audit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6050/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6050/artifact/patchprocess/diffcheckstylehadoop-distcp.txt | | hadoop-distcp test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6050/artifact/patchprocess/testrun_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6050/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6050/console | This message was automatically generated. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated MAPREDUCE-6451: --- Attachment: MAPREDUCE-6451-v4.patch Thank you [~eepayne] for your review comments. Updated the patch. For the checkstyle issue of missing package-info.java , the file was not there for mapred/lib directory before and therefore I did not add one now. There is one for tools package already. I was referring the wrong .class for the new file, corrected that now. Request for review. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
[ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941658#comment-14941658 ] Hudson commented on MAPREDUCE-6485: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #2385 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2385/]) MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev 439f43ad3defbac907eda2d139a793f153544430) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java > MR job hanged forever because all resources are taken up by reducers and the > last map attempt never get resource to run > --- > > Key: MAPREDUCE-6485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1 >Reporter: Bob >Assignee: Xianyin Xin >Priority: Critical > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, > MAPREDUCE-6485.005.patch, MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, > MAPREDUCE-6845.003.patch > > > The scenarios is like this: > With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces > will take resource and start to run when all the map have not finished. > But It could happened that when all the resources are taken up by running > reduces, there is still one map not finished. > Under this condition , the last map have two task attempts . > As for the first attempt was killed due to timeout(mapreduce.task.timeout), > and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to > FAILED, but failed map attempt would not be restarted for there is still one > speculate map attempt in progressing. > As for the second attempt which was started due to having enable map task > speculative is pending at UNASSINGED state because of no resource available. > But the second map attempt request have lower priority than reduces, so > preemption would not happened. > As a result all reduces would not finished because of there is one map left. > and the last map hanged there because of no resource available. so, the job > would never finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6500) DynamicInputChunk and DynamicRecordReader class has no unit tests
Kuhu Shukla created MAPREDUCE-6500: -- Summary: DynamicInputChunk and DynamicRecordReader class has no unit tests Key: MAPREDUCE-6500 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6500 Project: Hadoop Map/Reduce Issue Type: Test Components: distcp Reporter: Kuhu Shukla Assignee: Kuhu Shukla Priority: Minor The Dynamic strategy of DistCp has test coverage only for its InputFormat class. It would be nice to have coverage for DynamicRecordReader and DynamicInputChunk classes as well -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated MAPREDUCE-6451: --- Attachment: MAPREDUCE-6451-v4.patch The release audit warning : {noformat} !? /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/hadoop-common-project/hadoop-common/CHANGES-HDFS-EC-7285.txt Lines that start with ? in the release audit report indicate files that do not have an Apache license header. {noformat} seems unrelated. Removed the whitespace issue. Rest are the same as before. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated MAPREDUCE-6451: --- Attachment: (was: MAPREDUCE-6451-v4.patch) > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated MAPREDUCE-6451: --- Attachment: MAPREDUCE-6451-v5.patch > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941784#comment-14941784 ] Hadoop QA commented on MAPREDUCE-6451: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 43s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 18s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 27s | The applied patch generated 2 new checkstyle issues (total was 36, now 27). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 56s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 7m 40s | Tests passed in hadoop-distcp. | | | | 47m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764842/MAPREDUCE-6451-v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fdf02d1 | | Release Audit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6051/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6051/artifact/patchprocess/diffcheckstylehadoop-distcp.txt | | hadoop-distcp test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6051/artifact/patchprocess/testrun_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6051/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6051/console | This message was automatically generated. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)