[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440439#comment-15440439 ] Hadoop QA commented on MAPREDUCE-6771: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 44s {color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 58s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825749/mapreduce6771.001.patch | | JIRA Issue | MAPREDUCE-6771 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3f99f0b54520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 19c743c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug >
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Status: Patch Available (was: Open) > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.001.patch Uploading a patch to fix this. Not sure how a unit test can be written. Any suggestion is greatly appreciated. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440379#comment-15440379 ] Haibo Chen commented on MAPREDUCE-6771: --- If tasked are killed or failed on NM before they can notify AM, the user need to dig through NM logs, or task logs hoping they can find some useful information as to why the task attempt failed. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440372#comment-15440372 ] Haibo Chen commented on MAPREDUCE-6771: --- Analysis: {code:java} RMContainerAllocator.getResources() { ... for (ContainerStatus cont : finishedContainers) { LOG.info("Received completed container " + cont.getContainerId()); TaskAttemptId attemptID = assignedRequests.get(cont.getContainerId()); if (attemptID == null) { LOG.error("Container complete event for unknown container id " + cont.getContainerId()); } else { pendingRelease.remove(cont.getContainerId()); assignedRequests.remove(attemptID); // send the container completed event to Task attempt eventHandler.handle(createContainerFinishedEvent(cont, attemptID)); // Send the diagnostics String diagnostics = StringInterner.weakIntern(cont.getDiagnostics()); eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(attemptID, diagnostics)); preemptionPolicy.handleCompletedContainer(attemptID); } ... } {code} The scenario in question is described as follows: A job is running, and one of tasks attempt running on a NM is killed by the NM because the container exceeds its resource limit. The container status/diagnostics is sent to RM by the NM and then later to MR AM in its periodical heartbeat with RM as shown above. In MR AM, the task attempt is still in RUNNING state from AM's perspective, since the task heartbeat has not timed out. Upon receiving from RM that the task attempt container has finished, the RMCommunicator thread will place a ContainerFinishedEvent and a TaskAttemptDiagnosticsUpdateEvent in the event queue. The ContainerFinishedEvent will cause the task attempt in MR AM to transition from RUNNING to FAILED and a TaskAttemptUnsuccessfulCompletionEvent that contains the associated diagnostics information to be written to the .jhist file. The TaskAttemptDiagnosticsUpdateEvent will update the diagnostics information associated with the task attempt. But since the ContainerFinishedEvent is placed and processed before the TaskAttemptDiagnosticsUpdateEvent, the TaskAttemptUnsuccessfulCompletionEvent written to .jhist file will not contain the diagnostics info received from RM. After the job is completed, the user tries to access the failed task attempts through JHS, the TaskAttemptUnsuccessfulCompletionEvent is parsed to generate the failed attempt page. The page will not have diagnostics info from RM (such as container killed by Node Manager...) because it was never written to .jhist in the first place. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Summary: Diagnostics information can be lost in .jhist if task containers are killed by Node Manager. (was: Diagnostics information is lost in .jhist if task containers are killed by Node Manager.) > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6771) Diagnostics information is lost in .jhist if task containers are killed by Node Manager.
Haibo Chen created MAPREDUCE-6771: - Summary: Diagnostics information is lost in .jhist if task containers are killed by Node Manager. Key: MAPREDUCE-6771 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.7.3 Reporter: Haibo Chen Assignee: Haibo Chen Task containers can go over their resource limit, and killed by Node Manager. Then MR AM gets notified of the container status and diagnostics information through its heartbeat with RM. However, it is possible that the diagnostics information never gets into .jhist file, so when the job completes, the diagnostics information associated with the failed task attempts is empty. This makes it hard for users to root cause job failures that are often caused by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6769) Fix forgotten conversion from "slave" to "worker" in mapred script
[ https://issues.apache.org/jira/browse/MAPREDUCE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6769: Status: Patch Available (was: Open) > Fix forgotten conversion from "slave" to "worker" in mapred script > -- > > Key: MAPREDUCE-6769 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6769 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0-alpha2 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Minor > > In HADOOP-13209 (commit 23c3ff85a9e73d8f0755e14f12cc7c89b72acddd), "slaves" > was replaced with "workers" including the function name change from > hadoop_common_slave_mode_execute to hadoop_common_worker_mode_execute and > environment variable name change from HADOOP_SLAVE_MODE to HADOOP_WORKER_MODE. > It appears this change was forgotten in hadoop-mapred-project/bin/mapred. > Github pull request with fix to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6769) Fix forgotten conversion from "slave" to "worker" in mapred script
[ https://issues.apache.org/jira/browse/MAPREDUCE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6769: Assignee: Albert Chu > Fix forgotten conversion from "slave" to "worker" in mapred script > -- > > Key: MAPREDUCE-6769 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6769 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0-alpha2 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Minor > > In HADOOP-13209 (commit 23c3ff85a9e73d8f0755e14f12cc7c89b72acddd), "slaves" > was replaced with "workers" including the function name change from > hadoop_common_slave_mode_execute to hadoop_common_worker_mode_execute and > environment variable name change from HADOOP_SLAVE_MODE to HADOOP_WORKER_MODE. > It appears this change was forgotten in hadoop-mapred-project/bin/mapred. > Github pull request with fix to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6770) NodeHealthScriptRunner#reportHealthStatus bug fix
Yufei Gu created MAPREDUCE-6770: --- Summary: NodeHealthScriptRunner#reportHealthStatus bug fix Key: MAPREDUCE-6770 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6770 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Reporter: Yufei Gu Assignee: Yufei Gu {code} case FAILED_WITH_EXIT_CODE: setHealthStatus(true, "", now); break; {code} should be {code} case FAILED_WITH_EXIT_CODE: setHealthStatus(false, "", now); break; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439675#comment-15439675 ] Hadoop QA commented on MAPREDUCE-6768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: The patch generated 1 new + 118 unchanged - 1 fixed = 119 total (was 119) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 49s {color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 39s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825692/mapreduce6768.002.patch | | JIRA Issue | MAPREDUCE-6768 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux cacaee0233c1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cde3a00 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6699/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6699/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6699/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > TestRecovery.testSpeculative failed with NPE > > > Key: MAPREDUCE-6768 > URL: https://issue
[jira] [Commented] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439643#comment-15439643 ] Jason Lowe commented on MAPREDUCE-6768: --- bq. I guess I must be following some bad practice I have seen in the code base. It's not a terrible practice, just that I've been sensitive to unnecessarily long sleeps in unit tests lately. See the discussion at YARN-5393 for details. +1 pending Jenkins. The patch still can't be used as-is on other branches since JDK7 will want task1Attempt2 to be final for use in the inner class, but that's something I can easily fix during the commit. > TestRecovery.testSpeculative failed with NPE > > > Key: MAPREDUCE-6768 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6768 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6768.001.patch, mapreduce6768.002.patch > > > 1 tests failed. > REGRESSION: org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative(TestRecovery.java:1201) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439589#comment-15439589 ] Haibo Chen commented on MAPREDUCE-6768: --- Thanks for the review Jason! I guess I must be following some bad practice I have seen in the code base. In the new patch, I have increased the overall timeout to 10s, and lowered the check interval to 10 milliseconds. Also, removed the use of lambda. > TestRecovery.testSpeculative failed with NPE > > > Key: MAPREDUCE-6768 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6768 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6768.001.patch, mapreduce6768.002.patch > > > 1 tests failed. > REGRESSION: org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative(TestRecovery.java:1201) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6768: -- Attachment: mapreduce6768.002.patch > TestRecovery.testSpeculative failed with NPE > > > Key: MAPREDUCE-6768 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6768 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6768.001.patch, mapreduce6768.002.patch > > > 1 tests failed. > REGRESSION: org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative(TestRecovery.java:1201) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6769) Fix forgotten conversion from "slave" to "worker" in mapred script
[ https://issues.apache.org/jira/browse/MAPREDUCE-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439526#comment-15439526 ] ASF GitHub Bot commented on MAPREDUCE-6769: --- GitHub user chu11 opened a pull request: https://github.com/apache/hadoop/pull/123 MAPREDUCE-6769. Fix forgotten name conversion from "slave" to "worker" in mapred script, most notably fixing environment variable name change and function name change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chu11/hadoop MAPREDUCE-6769 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #123 commit 2bdf1a0e3e993a1bd7b1dd94e4d4fd42b6d26907 Author: Albert Chu Date: 2016-08-26T18:19:09Z Fix forgotten name conversion from "slave" to "worker" in mapred script, most notably fixing environment variable name change and function name change. > Fix forgotten conversion from "slave" to "worker" in mapred script > -- > > Key: MAPREDUCE-6769 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6769 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0-alpha2 >Reporter: Albert Chu >Priority: Minor > > In HADOOP-13209 (commit 23c3ff85a9e73d8f0755e14f12cc7c89b72acddd), "slaves" > was replaced with "workers" including the function name change from > hadoop_common_slave_mode_execute to hadoop_common_worker_mode_execute and > environment variable name change from HADOOP_SLAVE_MODE to HADOOP_WORKER_MODE. > It appears this change was forgotten in hadoop-mapred-project/bin/mapred. > Github pull request with fix to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6769) Fix forgotten conversion from "slave" to "worker" in mapred script
Albert Chu created MAPREDUCE-6769: - Summary: Fix forgotten conversion from "slave" to "worker" in mapred script Key: MAPREDUCE-6769 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6769 Project: Hadoop Map/Reduce Issue Type: Bug Components: scripts Affects Versions: 3.0.0-alpha2 Reporter: Albert Chu Priority: Minor In HADOOP-13209 (commit 23c3ff85a9e73d8f0755e14f12cc7c89b72acddd), "slaves" was replaced with "workers" including the function name change from hadoop_common_slave_mode_execute to hadoop_common_worker_mode_execute and environment variable name change from HADOOP_SLAVE_MODE to HADOOP_WORKER_MODE. It appears this change was forgotten in hadoop-mapred-project/bin/mapred. Github pull request with fix to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6740) Enforce mapreduce.task.timeout to be at least mapreduce.task.progress-report.interval
[ https://issues.apache.org/jira/browse/MAPREDUCE-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439452#comment-15439452 ] Karthik Kambatla commented on MAPREDUCE-6740: - Latest patch looks good. Couple of minor comments: # TaskHeartbeatHandler: When declaring taskTimeout, avoid setting the value to {{5 * 60 * 1000}}. # Do we need a test case for when we don't set TASK_REPORT_INTERVAL? > Enforce mapreduce.task.timeout to be at least > mapreduce.task.progress-report.interval > - > > Key: MAPREDUCE-6740 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6740 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.8.0 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Minor > Attachments: mapreduce6740.001.patch, mapreduce6740.002.patch, > mapreduce6740.003.patch, mapreduce6740.004.patch, mapreduce6740.005.patch, > mapreduce6740.006.patch > > > Mapreduce-6242 makes task status update interval configurable to ease the > pressure on MR AM to process status updates, but it did not ensure that > mapreduce.task.timeout is no smaller than the configured value of task report > interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6768) TestRecovery.testSpeculative failed with NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438934#comment-15438934 ] Jason Lowe commented on MAPREDUCE-6768: --- Thanks for the patch! I suspect this patch is going to be appropriate for more than just trunk, so as such it'd be good to avoid the lambda use. I think only a 800 msec wait is going to be too short if the test runs on a slow VM or some other hiccup occurs. Nit: Any reason to wait 100mec instead of 10 per iteration? Yes, I'm overly sensitive to sleeps lately with all the slow YARN tests. ;-) > TestRecovery.testSpeculative failed with NPE > > > Key: MAPREDUCE-6768 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6768 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6768.001.patch > > > 1 tests failed. > REGRESSION: org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative > Error Message: > null > Stack Trace: > java.lang.NullPointerException: null > at > org.apache.hadoop.mapreduce.v2.app.TestRecovery.testSpeculative(TestRecovery.java:1201) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org