[jira] [Updated] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3565: Attachment: YARN-3565-20150502-1.patch Hi [~Wangd], Attaching a patch with the modifications to support NodeLabel instead of string in NM HB/Register NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-3565-20150502-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525103#comment-14525103 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 42s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 19s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 8m 50s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 6m 47s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 5m 50s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 52m 13s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 117m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729924/YARN-3381-003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/7630/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7630/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7630/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7630/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7630/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7630/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7630/console | This message was automatically generated. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525143#comment-14525143 ] sandflee commented on YARN-3554: set this to a bigger value maybe based on network partition considerations not only for nm restart. Default value for maximum nodemanager connect wait time is too high --- Key: YARN-3554 URL: https://issues.apache.org/jira/browse/YARN-3554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R Labels: newbie Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 msec or 15 minutes, which is way too high. The default container expiry time from the RM and the default task timeout in MapReduce are both only 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525252#comment-14525252 ] Naganarasimha G R commented on YARN-3513: - Thanks for commenting on this [~gtCarrera], I found out this when i was referring to YARN-3334 (2928 subjira) patch modifications after it was committed, and as per the code in 2928 branch, {{vmemStillInUsage and pmemStillInUsage}} has not been made use and other variables {{currentPmemUsage}} and {{cpuUsageTotalCoresPercentage}} have been used to publish the container metrics to ATS. Remove unused variables in ContainersMonitorImpl Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3513.20150421-1.patch class members : {{private final Context context;}} and some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525299#comment-14525299 ] Hudson commented on YARN-3363: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #182 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/182/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525296#comment-14525296 ] Hudson commented on YARN-2893: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #182 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/182/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/CHANGES.txt AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525298#comment-14525298 ] Hudson commented on YARN-3006: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #182 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/182/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525215#comment-14525215 ] Hudson commented on YARN-2893: -- FAILURE: Integrated in Hadoop-Yarn-trunk #915 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/915/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525217#comment-14525217 ] Hudson commented on YARN-3006: -- FAILURE: Integrated in Hadoop-Yarn-trunk #915 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/915/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525212#comment-14525212 ] Hadoop QA commented on YARN-3565: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 23s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 51s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 49s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 52m 19s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 104m 29s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729929/YARN-3565-20150502-1.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / f1a152c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7655/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7655/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7655/console | This message was automatically generated. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Attachments: YARN-3565-20150502-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525218#comment-14525218 ] Hudson commented on YARN-3363: -- FAILURE: Integrated in Hadoop-Yarn-trunk #915 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/915/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525283#comment-14525283 ] Hudson commented on YARN-2893: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2113/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525285#comment-14525285 ] Hudson commented on YARN-3006: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2113/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525286#comment-14525286 ] Hudson commented on YARN-3363: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2113/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525198#comment-14525198 ] Hudson commented on YARN-2893: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #181 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/181/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525201#comment-14525201 ] Hudson commented on YARN-3363: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #181 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/181/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525200#comment-14525200 ] Hudson commented on YARN-3006: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #181 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/181/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/CHANGES.txt Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened YARN-2764: -- counters.LimitExceededException shouldn't abort AsyncDispatcher --- Key: YARN-2764 URL: https://issues.apache.org/jira/browse/YARN-2764 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Ted Yu Labels: counters I saw the following in container log: {code} 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attemptattempt_1414221548789_0023_r_03_0 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. {code} Counter limit was exceeded when JobFinishedEvent was created. Better handling of LimitExceededException should be provided so that AsyncDispatcher can continue functioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2454) Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources.
[ https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2454: - Summary: Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources. (was: The function compareTo of variable UNBOUNDED in org.apache.hadoop.yarn.util.resource.Resources is definited wrong.) Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources. --- Key: YARN-2454 URL: https://issues.apache.org/jira/browse/YARN-2454 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Xu Yang Assignee: Xu Yang Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, YARN-2454.patch The variable UNBOUNDED implement the abstract class Resources, and override the function compareTo. But there is something wrong in this function. We should not compare resources with zero as the same as the variable NONE. We should change 0 to Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525259#comment-14525259 ] Masatake Iwasaki commented on YARN-1418: HI, [~djp]. We don't have proposal yet but I will write some and attach it in a few days. Add Tracing to YARN --- Key: YARN-1418 URL: https://issues.apache.org/jira/browse/YARN-1418 Project: Hadoop YARN Issue Type: Improvement Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. The most part of changes needed for basis such as RPC seems to be almost ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1564) add some basic workflow YARN services
[ https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525243#comment-14525243 ] Steve Loughran commented on YARN-1564: -- I should look at the tests for the execution service; we have a fork of these in slider and they were failing on windows unless the installation had the right path setup with all the cygwin binaries (ls and the like). add some basic workflow YARN services - Key: YARN-1564 URL: https://issues.apache.org/jira/browse/YARN-1564 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-1564-001.patch Original Estimate: 24h Time Spent: 48h Remaining Estimate: 0h I've been using some alternative composite services to help build workflows of process execution in a YARN AM. They and their tests could be moved in YARN for the use by others -this would make it easier to build aggregate services in an AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525315#comment-14525315 ] Hudson commented on YARN-2893: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2131/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525317#comment-14525317 ] Hudson commented on YARN-3006: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2131/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525318#comment-14525318 ] Hudson commented on YARN-3363: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2131/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Description: When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. was:When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525245#comment-14525245 ] Naganarasimha G R commented on YARN-3554: - Hi [~gtCarrera9], Thanks for commenting on this jira but did not get the intention completely, whether you are expecting me to merge the changes required for 3518 here ? if so i had few questions 1. yarn-3518 tries to modify default value of yarn.resourcemanager.connect.max-wait.ms from 90 to 60, which not only impacts timeout from AM - RM but also NM - RM and client(cli, web, application report etc..) - RM. Is that ok ? (I am ok with it but just wanted to point it out) 2. Given the current high availability, is it required to wait for 10 mins to detect that RM has failed is valid or shall i decrease that too to 3 mins ? If you inform i can merge the changes of 3518 and also update in yarn-default.xml which is missing in 3518. Default value for maximum nodemanager connect wait time is too high --- Key: YARN-3554 URL: https://issues.apache.org/jira/browse/YARN-3554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R Labels: newbie Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 msec or 15 minutes, which is way too high. The default container expiry time from the RM and the default task timeout in MapReduce are both only 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3480: --- Attachment: YARN-3480.03.patch Update patch. Fix javac warning, checkstyple and test cases error. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort
[ https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525242#comment-14525242 ] Steve Loughran commented on YARN-1946: -- This has got more complex with the HA proxy stuff: there's 1 (host,port) pair and the caller needs to handle failover. its still something that could be made public for anyone wanting to do their own AmIPFilter; they'd just need to add failover logic themselves need Public interface for WebAppUtils.getProxyHostAndPort - Key: YARN-1946 URL: https://issues.apache.org/jira/browse/YARN-1946 Project: Hadoop YARN Issue Type: Sub-task Components: api, webapp Affects Versions: 2.4.0 Reporter: Thomas Graves Priority: Critical ApplicationMasters are supposed to go through the ResourceManager web app proxy if they have web UI's so they are properly secured. There is currently no public interface for Application Masters to conveniently get the proxy host and port. There is a function in WebAppUtils, but that class is private. We should provide this as a utility since any properly written AM will need to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525264#comment-14525264 ] sandflee commented on YARN-3554: Hi [~Naganarasimha] 3 mins seems dangerous, If rm fails over and the recover takes serval mins , nm maybe kill all containers, in production env, it's not expected. Default value for maximum nodemanager connect wait time is too high --- Key: YARN-3554 URL: https://issues.apache.org/jira/browse/YARN-3554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R Labels: newbie Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 msec or 15 minutes, which is way too high. The default container expiry time from the RM and the default task timeout in MapReduce are both only 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525272#comment-14525272 ] Hudson commented on YARN-3006: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/172/]) YARN-3006. Improve the error message when attempting manual failover with auto-failover enabled. (Akira AJISAKA via wangda) (wangda: rev 7d46a806e71de6692cd230e64e7de18a8252019d) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/CHANGES.txt Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Fix For: 2.8.0, 2.7.1 Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525270#comment-14525270 ] Hudson commented on YARN-2893: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/172/]) YARN-2893. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream. (Zhihai Xu via gera) (gera: rev f8204e241d9271497defd4d42646fb89c61cefe3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Fix For: 2.8.0 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch, YARN-2893.003.patch, YARN-2893.004.patch, YARN-2893.005.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525273#comment-14525273 ] Hudson commented on YARN-3363: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #172 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/172/]) YARN-3363. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. (zxu via rkanter) (rkanter: rev ac7d152901e29b1f444507fe4e421eb6e1402b5a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerStartMonitoringEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/CHANGES.txt add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Fix For: 2.8.0 Attachments: YARN-3363.000.patch, YARN-3363.001.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2123) Progress bars in Web UI always at 100% due to non-US locale
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2123: Attachment: YARN-2123-004.patch Thank you [~ozawa]. Attached v4 patch. Progress bars in Web UI always at 100% due to non-US locale --- Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: NaN_after_launching_RM.png, YARN-2123-001.patch, YARN-2123-002.patch, YARN-2123-003.patch, YARN-2123-004.patch, fair-scheduler-ajisaka.xml, screenshot-noPatch.png, screenshot-patch.png, screenshot.png, yarn-site-ajisaka.xml In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster
[ https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525374#comment-14525374 ] Junping Du commented on YARN-2892: -- Sure. [~leftnoteasy], I agree. please go ahead. Unable to get AMRMToken in unmanaged AM when using a secure cluster --- Key: YARN-2892 URL: https://issues.apache.org/jira/browse/YARN-2892 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Sevada Abraamyan Assignee: Sevada Abraamyan Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch An AMRMToken is retrieved from the ApplicationReport by the YarnClient. When the RM creates the ApplicationReport and sends it back to the client it makes a simple security check whether it should include the AMRMToken in the report (See createAndGetApplicationReport in RMAppImpl).This security check verifies that the user who submitted the original application is the same user who is requesting the ApplicationReport. If they are indeed the same user then it includes the AMRMToken, otherwise it does not include it. The problem arises from the fact that when an application is submitted, the RM saves the short username of the user who created the application (See submitApplication in ClientRmService). Afterwards when the ApplicationReport is requested, the system tries to match the full username of the requester against the previously stored short username. In a secure cluster using Kerberos this check fails because the principle is stripped from the username when we request a short username. So for example the short username might be Foo whereas the full username is f...@company.com Note: A very similar problem has been previously reported ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high
[ https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525406#comment-14525406 ] Li Lu commented on YARN-3554: - Hi [~Naganarasimha], I just wanted to bring that JIRA into attention. We may want to share some discussions for both JIRAs. Default value for maximum nodemanager connect wait time is too high --- Key: YARN-3554 URL: https://issues.apache.org/jira/browse/YARN-3554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Naganarasimha G R Labels: newbie Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 msec or 15 minutes, which is way too high. The default container expiry time from the RM and the default task timeout in MapReduce are both only 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525410#comment-14525410 ] Billie Rinaldi commented on YARN-3422: -- Let’s say we post entity A with related entity B and primary filter C. This implies a directional relationship B - A. The entries stored include the following: {noformat} entity entry: A (with hidden B) related entity entry: B A primary filter entry: C A (no B) {noformat} The patch submitted adds primary filter entry C B A, which is not correct for the existing design because C was posted as a primary filter for A, not for B. What we might want (that the store does not currently do) is for A to be added under primary filter entries for B (i.e. D B A, where D is a primary filter for B). One problem with doing this is that we do not know the primary filters for B when we are posting entity A, and a further problem is that the primary filters for B could change over time and have to be kept up to date. relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2454) Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources.
[ https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525362#comment-14525362 ] Junping Du commented on YARN-2454: -- Also, Congratulation to [~yxls123123] for the first patch contribution to Apache Hadoop! Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources. --- Key: YARN-2454 URL: https://issues.apache.org/jira/browse/YARN-2454 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Xu Yang Assignee: Xu Yang Fix For: 2.8.0 Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, YARN-2454.patch The variable UNBOUNDED implement the abstract class Resources, and override the function compareTo. But there is something wrong in this function. We should not compare resources with zero as the same as the variable NONE. We should change 0 to Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2454) Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources.
[ https://issues.apache.org/jira/browse/YARN-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525395#comment-14525395 ] Hudson commented on YARN-2454: -- FAILURE: Integrated in Hadoop-trunk-Commit #7717 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7717/]) YARN-2454. Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources. Contributed by Xu Yang. (junping_du: rev 57d9a972cbd62aae0ab010d38a0973619972edd6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResources.java * hadoop-yarn-project/CHANGES.txt Fix compareTo of variable UNBOUNDED in o.a.h.y.util.resource.Resources. --- Key: YARN-2454 URL: https://issues.apache.org/jira/browse/YARN-2454 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Xu Yang Assignee: Xu Yang Fix For: 2.8.0 Attachments: YARN-2454 -v2.patch, YARN-2454-patch.diff, YARN-2454.patch The variable UNBOUNDED implement the abstract class Resources, and override the function compareTo. But there is something wrong in this function. We should not compare resources with zero as the same as the variable NONE. We should change 0 to Integer.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3513) Remove unused variables in ContainersMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525412#comment-14525412 ] Li Lu commented on YARN-3513: - No I was talking about these lines in YARN-3334: {code} +try { + TimelineClient timelineClient = context.getApplications().get( + containerId.getApplicationAttemptId().getApplicationId()). + getTimelineClient(); + putEntityWithoutBlocking(timelineClient, entity); +} {code} which refs context and will have problems with {code} - private final Context context; {code} This may be fine in trunk, but since YARN-2928 needs to merge back in near future, we may not want to make the change on content for now. We need to consider code clean up comprehensively when we're doing the branch merge. Remove unused variables in ContainersMonitorImpl Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3513.20150421-1.patch class members : {{private final Context context;}} and some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525417#comment-14525417 ] Billie Rinaldi commented on YARN-3422: -- In retrospect, the directional nature of the related entity relationship seems to make things more confusing. Perhaps it would be better if relatedness were bidirectional. relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% due to non-US locale
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525336#comment-14525336 ] Akira AJISAKA commented on YARN-2123: - and thanks [~xgong] for pinging me. Progress bars in Web UI always at 100% due to non-US locale --- Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: NaN_after_launching_RM.png, YARN-2123-001.patch, YARN-2123-002.patch, YARN-2123-003.patch, YARN-2123-004.patch, fair-scheduler-ajisaka.xml, screenshot-noPatch.png, screenshot-patch.png, screenshot.png, yarn-site-ajisaka.xml In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3097) Logging of resource recovery on NM restart has redundancies
[ https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-3097: - Attachment: YARN-3097.001.patch Logging of resource recovery on NM restart has redundancies --- Key: YARN-3097 URL: https://issues.apache.org/jira/browse/YARN-3097 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Minor Labels: newbie Attachments: YARN-3097.001.patch ResourceLocalizationService logs that it is recovering a resource with the remote and local paths, but then very shortly afterwards the LocalizedResource emits an INIT-LOCALIZED transition that also logs the same remote and local paths. The recovery message should be a debug message, since it's not conveying any useful information that isn't already covered by the resource state transition log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1418) Add Tracing to YARN
[ https://issues.apache.org/jira/browse/YARN-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525381#comment-14525381 ] Junping Du commented on YARN-1418: -- No worry. [~iwasakims], just curious on this feature as we typically have some writeup for umbrella JIRA so other contributors can help. Add Tracing to YARN --- Key: YARN-1418 URL: https://issues.apache.org/jira/browse/YARN-1418 Project: Hadoop YARN Issue Type: Improvement Components: api, nodemanager, resourcemanager Reporter: Masatake Iwasaki Assignee: Yi Liu Adding tracing using HTrace in the same way as HBASE-6449 and HDFS-5274. The most part of changes needed for basis such as RPC seems to be almost ready in HDFS-5274. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3570) Non-zero exit status of master application not propagated
Eric O. LEBIGOT (EOL) created YARN-3570: --- Summary: Non-zero exit status of master application not propagated Key: YARN-3570 URL: https://issues.apache.org/jira/browse/YARN-3570 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Environment: PySpark on AWS EMR Reporter: Eric O. LEBIGOT (EOL) The master of my application fails, but the Final app status is 0. This causes all sorts of problems (EMR not detecting a problem, my data pipeline continuing, etc.). Here is what happens. The master fails (showing only relevant lines from daemons/i-…/yarn-hadoop-nodemanager-ip-….log.gz): {quote} 2015-05-02 03:32:11,000 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (ContainersLauncher #0): Exit code from container container_1430537363277_0001_01_01 is : 1 2015-05-02 03:32:11,001 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (ContainersLauncher #0): Exception from container-launch with container ID: container_1430537363277_0001_01_01 and exit code: 1 2015-05-02 03:32:11,003 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch (ContainersLauncher #0): Container exited with a non-zero exit code 1 2015-05-02 03:32:11,004 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container (AsyncDispatcher event handler): Container container_1430537363277_0001_01_01 transitioned from RUNNING to EXITED_WITH_FAILURE 2015-05-02 03:32:11,032 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger (AsyncDispatcher event handler): USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILUREAPPID=application_1430537363277_0001 CONTAINERID=container_1430537363277_0001_01_01 2015-05-02 03:32:11,032 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container (AsyncDispatcher event handler): Container container_1430537363277_0001_01_01 transitioned from EXITED_WITH_FAILURE to DONE {quote} and, from ./daemons/i-…/yarn-hadoop-resourcemanager-ip-….log.gz {quote} 2015-05-02 03:32:10,493 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl (AsyncDispatcher event handler): Updating application attempt appattempt_1430537363277_0001_01 with final state: FINISHING, and exit status: -1000 {quote} Now, the whole application nonetheless strangely returns a 0 exit code, in ./task-attempts/application_1430537363277_0001/container_1430537363277_0001_01_01/stderr.gz : {quote} 15/05/02 03:32:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) {quote} The reason for this error hiding is maybe given by this last reason (early shutdown hook). Now, is this a possible YARN bug? or is it more likely that something is happening with the AWS EMR cluster manager that I am using (maybe it detects a task failure before YARN and shuts down the PySpark application that was running on YARN?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3571) AM does not re-blacklist NMs after ignoring-blacklist event happens?
Hao Zhu created YARN-3571: - Summary: AM does not re-blacklist NMs after ignoring-blacklist event happens? Key: YARN-3571 URL: https://issues.apache.org/jira/browse/YARN-3571 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.1 Reporter: Hao Zhu Detailed analysis are in item 3 Will AM re-blacklist NMs after ignoring-blacklist event happens? of below link: http://www.openkb.info/2015/05/when-will-application-master-blacklist.html The current behavior is : if that Node Manager has ever been blacklisted before, then it will not be blacklisted again after ignore-blacklist happens; Else, it will be blacklisted. The code logic is in function containerFailedOnHost(String hostName) of RMContainerRequestor.java: {code} protected void containerFailedOnHost(String hostName) { if (!nodeBlacklistingEnabled) { return; } if (blacklistedNodes.contains(hostName)) { if (LOG.isDebugEnabled()) { LOG.debug(Host + hostName + is already blacklisted.); } return; //already blacklisted {code} The reason of above behavior is in above item 2: when ignoring-blacklist happens, it only ask RM to clear blacklistAdditions, however it dose not clear the blacklistedNodes variable. This behavior may cause the whole job/application to fail if the previous blacklisted NM was released after ignoring-blacklist event happens. Imagine a serial murder is released from prison just because the prison is 33% full, and horribly he/she will never be put in prison again. Only new murder will be put in prison. Example to prove: Test 1: One node(h4) has issue, other 3 nodes are healthy. The job failed with below AM logs: {code} [root@h1 container_1430425729977_0006_01_01]# egrep -i 'failures on node|blacklist|FATAL' syslog 2015-05-02 18:38:41,246 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true 2015-05-02 18:38:41,246 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 1 2015-05-02 18:39:07,249 FATAL [IPC Server handler 3 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:07,297 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node h4.poc.com 2015-05-02 18:39:07,950 FATAL [IPC Server handler 16 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_08_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:07,954 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 2 failures on node h4.poc.com 2015-05-02 18:39:08,148 FATAL [IPC Server handler 17 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_07_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:08,152 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 3 failures on node h4.poc.com 2015-05-02 18:39:08,152 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Blacklisted host h4.poc.com 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the blacklist for application_1430425729977_0006: blacklistAdditions=1 blacklistRemovals=0 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Ignore blacklisting set to true. Known: 4, Blacklisted: 1, 25% 2015-05-02 18:39:09,563 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the blacklist for application_1430425729977_0006: blacklistAdditions=0 blacklistRemovals=1 2015-05-02 18:39:32,912 FATAL [IPC Server handler 19 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_1 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:35,076 FATAL [IPC Server handler 1 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_09_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:35,133 FATAL [IPC Server handler 5 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_08_1 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:57,308 FATAL [IPC Server handler 17 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_2 - exited : java.io.IOException: Spill failed 2015-05-02 18:40:00,174 FATAL [IPC Server handler 10 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_09_1 - exited : java.io.IOException: Spill failed 2015-05-02
[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% due to non-US locale
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525494#comment-14525494 ] Hadoop QA commented on YARN-2123: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 49s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 24s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 8m 42s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 62m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 115m 4s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729948/YARN-2123-004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/7657/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7657/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7657/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7657/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7657/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7657/console | This message was automatically generated. Progress bars in Web UI always at 100% due to non-US locale --- Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: NaN_after_launching_RM.png, YARN-2123-001.patch, YARN-2123-002.patch, YARN-2123-003.patch, YARN-2123-004.patch, fair-scheduler-ajisaka.xml, screenshot-noPatch.png, screenshot-patch.png, screenshot.png, yarn-site-ajisaka.xml In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3097) Logging of resource recovery on NM restart has redundancies
[ https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525461#comment-14525461 ] Hadoop QA commented on YARN-3097: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 50s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 41m 42s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729950/YARN-3097.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7658/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7658/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7658/console | This message was automatically generated. Logging of resource recovery on NM restart has redundancies --- Key: YARN-3097 URL: https://issues.apache.org/jira/browse/YARN-3097 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Minor Labels: newbie Attachments: YARN-3097.001.patch ResourceLocalizationService logs that it is recovering a resource with the remote and local paths, but then very shortly afterwards the LocalizedResource emits an INIT-LOCALIZED transition that also logs the same remote and local paths. The recovery message should be a debug message, since it's not conveying any useful information that isn't already covered by the resource state transition log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3385) Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
[ https://issues.apache.org/jira/browse/YARN-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525477#comment-14525477 ] Hadoop QA commented on YARN-3385: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 48s | The applied patch generated 1 new checkstyle issues (total was 42, now 43). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 49m 49s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 87m 25s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729901/YARN-3385.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7656/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7656/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7656/console | This message was automatically generated. Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion. --- Key: YARN-3385 URL: https://issues.apache.org/jira/browse/YARN-3385 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3385.000.patch, YARN-3385.001.patch Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion(Op.delete). The race condition is similar as YARN-3023. since the race condition exists for ZK node creation, it should also exist for ZK node deletion. We see this issue with the following stack trace: {code} 2015-03-17 19:18:58,958 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:857) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:854) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:647) at
[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525491#comment-14525491 ] Hadoop QA commented on YARN-3480: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 13s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 37s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 52m 17s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 91m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729945/YARN-3480.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7659/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7659/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7659/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7659/console | This message was automatically generated. Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable Key: YARN-3480 URL: https://issues.apache.org/jira/browse/YARN-3480 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3480.01.patch, YARN-3480.02.patch, YARN-3480.03.patch When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore. BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1832) wrong MockLocalizerStatus.equals() method implementation
[ https://issues.apache.org/jira/browse/YARN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525495#comment-14525495 ] Hadoop QA commented on YARN-1832: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 35s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 47s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 22m 36s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12634678/YARN-1832.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7661/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7661/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7661/console | This message was automatically generated. wrong MockLocalizerStatus.equals() method implementation Key: YARN-1832 URL: https://issues.apache.org/jira/browse/YARN-1832 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Trivial Attachments: YARN-1832.patch return getLocalizerId().equals(other) ...; should be return getLocalizerId().equals(other. getLocalizerId()) ...; getLocalizerId() returns String. It's expected to compare this.getLocalizerId() against other.getLocalizerId(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1993) Cross-site scripting vulnerability in TextView.java
[ https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-1993: - Assignee: Kenji Kikushima Cross-site scripting vulnerability in TextView.java --- Key: YARN-1993 URL: https://issues.apache.org/jira/browse/YARN-1993 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Ted Yu Assignee: Kenji Kikushima Attachments: YARN-1993.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java , method echo() e.g. : {code} for (Object s : args) { out.print(s); } {code} Printing s to an HTML page allows cross-site scripting, because it was not properly sanitized for context HTML attribute name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525543#comment-14525543 ] Hadoop QA commented on YARN-679: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 31 new or modified test files. | | {color:red}-1{color} | javac | 7m 32s | The applied patch generated 130 additional warning messages. | | {color:red}-1{color} | javadoc | 9m 33s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 6s | The applied patch generated 150 new checkstyle issues (total was 140, now 287). | | {color:red}-1{color} | whitespace | 0m 7s | The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 40s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | common tests | 22m 21s | Tests failed in hadoop-common. | | | | 59m 38s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.service.launcher.TestServiceLaunchNoArgsAllowed | | | hadoop.service.launcher.TestServiceLaunchedRunning | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12653051/YARN-679-003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/7664/artifact/patchprocess/diffJavacWarnings.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7664/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7664/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7664/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7664/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7664/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7664/console | This message was automatically generated. add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, YARN-679-003.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf Time Spent: 72h Remaining Estimate: 0h There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1993) Cross-site scripting vulnerability in TextView.java
[ https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525526#comment-14525526 ] Hadoop QA commented on YARN-1993: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 7m 47s | The applied patch generated 173 additional warning messages. | | {color:red}-1{color} | javadoc | 10m 4s | The applied patch generated 14 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 53s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 24s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 39m 51s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12644792/YARN-1993.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/7663/artifact/patchprocess/diffJavacWarnings.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/7663/artifact/patchprocess/diffJavadocWarnings.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7663/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7663/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7663/console | This message was automatically generated. Cross-site scripting vulnerability in TextView.java --- Key: YARN-1993 URL: https://issues.apache.org/jira/browse/YARN-1993 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Ted Yu Attachments: YARN-1993.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java , method echo() e.g. : {code} for (Object s : args) { out.print(s); } {code} Printing s to an HTML page allows cross-site scripting, because it was not properly sanitized for context HTML attribute name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525527#comment-14525527 ] Hadoop QA commented on YARN-1878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 42s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 41m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12636856/YARN-1878.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7662/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7662/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7662/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7662/console | This message was automatically generated. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1993) Cross-site scripting vulnerability in TextView.java
[ https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525542#comment-14525542 ] Tsuyoshi Ozawa commented on YARN-1993: -- +1, committing this shortly. Cross-site scripting vulnerability in TextView.java --- Key: YARN-1993 URL: https://issues.apache.org/jira/browse/YARN-1993 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Ted Yu Assignee: Kenji Kikushima Attachments: YARN-1993.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java , method echo() e.g. : {code} for (Object s : args) { out.print(s); } {code} Printing s to an HTML page allows cross-site scripting, because it was not properly sanitized for context HTML attribute name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1106) The RM should point the tracking url to the RM app page if its empty
[ https://issues.apache.org/jira/browse/YARN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1452#comment-1452 ] Hadoop QA commented on YARN-1106: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 33s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 3m 18s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12637253/YARN-1106.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6ae2a0d | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7665/console | This message was automatically generated. The RM should point the tracking url to the RM app page if its empty Key: YARN-1106 URL: https://issues.apache.org/jira/browse/YARN-1106 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1106.patch, YARN-1106.patch It would be nice if the Resourcemanager set the tracking url to the RM app page if the application master doesn't pass one or passes the empty string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525660#comment-14525660 ] Hadoop QA commented on YARN-2821: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 22s | The applied patch generated 3 new checkstyle issues (total was 46, now 49). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 36s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12680098/apache-yarn-2821.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/diffcheckstylehadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7669/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7669/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7669/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525672#comment-14525672 ] Hadoop QA commented on YARN-2768: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 14s | The applied patch generated 1 new checkstyle issues (total was 6, now 7). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 38s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 52m 8s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 92m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12677855/YARN-2768.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7668/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7668/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7668/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7668/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7668/console | This message was automatically generated. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3571) AM does not re-blacklist NMs after ignoring-blacklist event happens?
[ https://issues.apache.org/jira/browse/YARN-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Zhu updated YARN-3571: -- Description: Detailed analysis are in item 3 Will AM re-blacklist NMs after ignoring-blacklist event happens? of below link: http://www.openkb.info/2015/05/when-will-application-master-blacklist.html The current behavior is : if that Node Manager has ever been blacklisted before, then it will not be blacklisted again after ignore-blacklist happens; Else, it will be blacklisted. However I think the right behavior should be : AM can re-blacklist NMs even after ignoring-blacklist happens once. The code logic is in function containerFailedOnHost(String hostName) of RMContainerRequestor.java: {code} protected void containerFailedOnHost(String hostName) { if (!nodeBlacklistingEnabled) { return; } if (blacklistedNodes.contains(hostName)) { if (LOG.isDebugEnabled()) { LOG.debug(Host + hostName + is already blacklisted.); } return; //already blacklisted {code} The reason of above behavior is in above item 2: when ignoring-blacklist happens, it only ask RM to clear blacklistAdditions, however it dose not clear the blacklistedNodes variable. This behavior may cause the whole job/application to fail if the previous blacklisted NM was released after ignoring-blacklist event happens. Imagine a serial murder is released from prison just because the prison is 33% full, and horribly he/she will never be put in prison again. Only new murder will be put in prison. Example to prove: Test 1: One node(h4) has issue, other 3 nodes are healthy. The job failed with below AM logs: {code} [root@h1 container_1430425729977_0006_01_01]# egrep -i 'failures on node|blacklist|FATAL' syslog 2015-05-02 18:38:41,246 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true 2015-05-02 18:38:41,246 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 1 2015-05-02 18:39:07,249 FATAL [IPC Server handler 3 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:07,297 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node h4.poc.com 2015-05-02 18:39:07,950 FATAL [IPC Server handler 16 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_08_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:07,954 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 2 failures on node h4.poc.com 2015-05-02 18:39:08,148 FATAL [IPC Server handler 17 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_07_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:08,152 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 3 failures on node h4.poc.com 2015-05-02 18:39:08,152 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Blacklisted host h4.poc.com 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the blacklist for application_1430425729977_0006: blacklistAdditions=1 blacklistRemovals=0 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Ignore blacklisting set to true. Known: 4, Blacklisted: 1, 25% 2015-05-02 18:39:09,563 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the blacklist for application_1430425729977_0006: blacklistAdditions=0 blacklistRemovals=1 2015-05-02 18:39:32,912 FATAL [IPC Server handler 19 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_1 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:35,076 FATAL [IPC Server handler 1 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_09_0 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:35,133 FATAL [IPC Server handler 5 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_08_1 - exited : java.io.IOException: Spill failed 2015-05-02 18:39:57,308 FATAL [IPC Server handler 17 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_02_2 - exited : java.io.IOException: Spill failed 2015-05-02 18:40:00,174 FATAL [IPC Server handler 10 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1430425729977_0006_m_09_1 - exited : java.io.IOException: Spill failed 2015-05-02 18:40:00,227 FATAL [IPC Server handler 12 on 41696] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
[jira] [Resolved] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-556. -- Resolution: Fixed Assignee: (was: Bikas Saha) Makes sense. Resolved as fixed. Keeping it unassigned given multiple contributors. No fix-version given the tasks spanned across releases. [Umbrella] RM Restart phase 2 - Work preserving restart --- Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-556: - Summary: [Umbrella] RM Restart phase 2 - Work preserving restart (was: RM Restart phase 2 - Work preserving restart) [Umbrella] RM Restart phase 2 - Work preserving restart --- Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-59) Text File Busy errors launching MR tasks
[ https://issues.apache.org/jira/browse/YARN-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-59. - Resolution: Duplicate From what I know, this is fixed via YARN-1271 + YARN-1295. Resolving as dup. Please reopen if you disagree. Text File Busy errors launching MR tasks -- Key: YARN-59 URL: https://issues.apache.org/jira/browse/YARN-59 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Todd Lipcon Assignee: Andy Isaacson Some very small percentage of tasks fail with a Text file busy error. The following was the original diagnosis: {quote} Our use of PrintWriter in TaskController.writeCommand is unsafe, since that class swallows all IO exceptions. We're not currently checking for errors, which I'm seeing result in occasional task failures with the message Text file busy - assumedly because the close() call is failing silently for some reason. {quote} .. but turned out to be another issue as well (see below) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3513) Remove unused variables in ContainersMonitorImpl
[ https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3513: -- Fix Version/s: (was: 2.8.0) Removing fix-version, use the target-version field for specifying your intent. Remove unused variables in ContainersMonitorImpl Key: YARN-3513 URL: https://issues.apache.org/jira/browse/YARN-3513 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Trivial Labels: newbie Attachments: YARN-3513.20150421-1.patch class members : {{private final Context context;}} and some local variables in MonitoringThread.run() : {{vmemStillInUsage and pmemStillInUsage}} are not used and just updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1993) Cross-site scripting vulnerability in TextView.java
[ https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525592#comment-14525592 ] Tsuyoshi Ozawa commented on YARN-1993: -- Warnings by javac and javadoc are not related to the patch. Cross-site scripting vulnerability in TextView.java --- Key: YARN-1993 URL: https://issues.apache.org/jira/browse/YARN-1993 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Ted Yu Assignee: Kenji Kikushima Attachments: YARN-1993.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java , method echo() e.g. : {code} for (Object s : args) { out.print(s); } {code} Printing s to an HTML page allows cross-site scripting, because it was not properly sanitized for context HTML attribute name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-148) CapacityScheduler shouldn't explicitly need YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-148. -- Resolution: Invalid I don't see this issue anymore, seems like it got resolved along the way. Resolving this old ticket. CapacityScheduler shouldn't explicitly need YarnConfiguration - Key: YARN-148 URL: https://issues.apache.org/jira/browse/YARN-148 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This was done in MAPREDUCE-3773. None of our service APIs warrant YarnConfiguration. We affect the proper loading of yarn-site.xml by explicitly creating YarnConfiguration in all the main classes - ResourceManager, NodeManager etc. Due to this extra dependency, tests are failing, see https://builds.apache.org/job/PreCommit-YARN-Build/74//testReport/org.apache.hadoop.yarn.client/TestYarnClient/testClientStop/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-128) [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery
[ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-128. -- Resolution: Fixed Resolving this umbrella JIRA. RM recovery has largely been nearly complete/stable in YARN since this ticket was opened, what with its ultimate usage for rolling-upgrades (YARN-666). - As new issues come in, we can open new tickets. - Will leave the open sub-tasks as they are. - No fix-version as this was done across releases. [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery - Key: YARN-128 URL: https://issues.apache.org/jira/browse/YARN-128 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Arun C Murthy Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, RMRestartPhase1.pdf, YARN-128.full-code-4.patch, YARN-128.full-code.3.patch, YARN-128.full-code.5.patch, YARN-128.new-code-added-4.patch, YARN-128.new-code-added.3.patch, YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch, YARN-128.patch, restart-12-11-zkstore.patch, restart-fs-store-11-17.patch, restart-zk-store-11-17.patch This umbrella jira tracks the work needed to preserve critical state information and reload them upon RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525627#comment-14525627 ] Hadoop QA commented on YARN-2331: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 2m 58s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7666/console | This message was automatically generated. Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2331.patch, YARN-2331v2.patch When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2775) There is no close method in NMWebServices#getLogs()
[ https://issues.apache.org/jira/browse/YARN-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525641#comment-14525641 ] Hadoop QA commented on YARN-2775: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 37s | The applied patch generated 1 new checkstyle issues (total was 7, now 8). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 3s | The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 51s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 41m 47s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | | Nullcheck of NMWebServices$1.val$fis at line 251 of value previously dereferenced in org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$1.write(OutputStream) At NMWebServices.java:251 of value previously dereferenced in org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$1.write(OutputStream) At NMWebServices.java:[line 247] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12678151/YARN-2775_001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e8d0ee5 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7667/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7667/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7667/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7667/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7667/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7667/console | This message was automatically generated. There is no close method in NMWebServices#getLogs() --- Key: YARN-2775 URL: https://issues.apache.org/jira/browse/YARN-2775 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: skrho Priority: Minor Attachments: YARN-2775_001.patch If getLogs method is called, fileInputStream object is accumulated in memory.. Because fileinputStream object is not closed.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525560#comment-14525560 ] Hadoop QA commented on YARN-1426: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 13s | The applied patch generated 1 new checkstyle issues (total was 76, now 72). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 56s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 108m 26s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 52m 13s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 198m 7s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12637242/YARN-1426.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 6ae2a0d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7660/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7660/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7660/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7660/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7660/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7660/console | This message was automatically generated. YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.patch, YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-47) [Umbrella] Security issues in YARN
[ https://issues.apache.org/jira/browse/YARN-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-47: Assignee: (was: Vinod Kumar Vavilapalli) Summary: [Umbrella] Security issues in YARN (was: Security issues in YARN) Resolving this very old umbrella JIRA. Security (auth + authz) has largely been nearly complete/stable in YARN since this ticket was opened. And as new requirements come in, we can open new tickets. - Will leave the open sub-tasks as they are. - Unassigning from me given multiple contributors on the tasks. - No fix-version as this was done across releases. [Umbrella] Security issues in YARN -- Key: YARN-47 URL: https://issues.apache.org/jira/browse/YARN-47 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli JIRA tracking YARN related security issues. Moving over YARN only stuff from MAPREDUCE-3101. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-47) [Umbrella] Security issues in YARN
[ https://issues.apache.org/jira/browse/YARN-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-47. - Resolution: Fixed [Umbrella] Security issues in YARN -- Key: YARN-47 URL: https://issues.apache.org/jira/browse/YARN-47 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli JIRA tracking YARN related security issues. Moving over YARN only stuff from MAPREDUCE-3101. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-100) container-executor should deal with stdout, stderr better
[ https://issues.apache.org/jira/browse/YARN-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-100. -- Resolution: Later LOGFILE and ERRORFILE were always this way, and it has worked out for long enough. I don't see requests to change them to point them to other files, going to close it as later for now. Please revert back if you disagree. container-executor should deal with stdout, stderr better - Key: YARN-100 URL: https://issues.apache.org/jira/browse/YARN-100 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: Colin Patrick McCabe Priority: Minor container-executor.c contains the following code: {code} fclose(stdin); fflush(LOGFILE); if (LOGFILE != stdout) { fclose(stdout); } if (ERRORFILE != stderr) { fclose(stderr); } if (chdir(primary_app_dir) != 0) { fprintf(LOGFILE, Failed to chdir to app dir - %s\n, strerror(errno)); return -1; } execvp(args[0], args); {code} Whenever you open a new file descriptor, its number is the lowest available number. So if {{stdout}} (fd number 1) has been closed, and you do open(/my/important/file), you'll get assigned file descriptor 1. This means that any printf statements in the program will be now printing to /my/important/file. Oops! The correct way to get rid of stdin, stdout, or stderr is not to close them, but to make them point to /dev/null. {{dup2}} can be used for this purpose. It looks like LOGFILE and ERRORFILE are always set to stdout and stderr at the moment. However, this is a latent bug that should be fixed in case these are ever made configurable (which seems to have been the intent). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-149) [Umbrella] ResourceManager (RM) Fail-over
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-149: - Summary: [Umbrella] ResourceManager (RM) Fail-over (was: ResourceManager (RM) High-Availability (HA)) [Umbrella] ResourceManager (RM) Fail-over - Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Harsh J Labels: patch Attachments: YARN ResourceManager Automatic Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-149) [Umbrella] ResourceManager (RM) Fail-over
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-149. -- Resolution: Fixed Resolving this umbrella JIRA. RM failover has largely been complete/stable in YARN since this ticket was opened. And as new requirements/bugs come in, we can open new tickets. - Will leave the open sub-tasks as they are. - No fix-version as this was done across releases. [Umbrella] ResourceManager (RM) Fail-over - Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Harsh J Labels: patch Attachments: YARN ResourceManager Automatic Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-149) [Umbrella] ResourceManager (RM) Fail-over
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525595#comment-14525595 ] Vinod Kumar Vavilapalli commented on YARN-149: -- And tx to [~kasha] and [~xgong] for bulk of the work here.. [Umbrella] ResourceManager (RM) Fail-over - Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Harsh J Labels: patch Attachments: YARN ResourceManager Automatic Failover-rev-07-21-13.pdf, YARN ResourceManager Automatic Failover-rev-08-04-13.pdf, rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-156) WebAppProxyServlet does not support http methods other than GET
[ https://issues.apache.org/jira/browse/YARN-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-156. -- Resolution: Duplicate Seems like there is more movement on this issue at YARN-2031. Given this, I am closing this as dup even though this was the earlier ticket to be created. Please revert back if you disagree. WebAppProxyServlet does not support http methods other than GET --- Key: YARN-156 URL: https://issues.apache.org/jira/browse/YARN-156 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Thomas Weise Should support all methods so that applications can use it for full web service access to master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-501) Application Master getting killed randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-501. -- Resolution: Not A Problem Haven't gotten a response on my last comment in a while. IAC, it is unlikely YARN can do much in this situation. Closing this again as not-a-problem. Application Master getting killed randomly reporting excess usage of memory --- Key: YARN-501 URL: https://issues.apache.org/jira/browse/YARN-501 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell, nodemanager Affects Versions: 2.0.3-alpha Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi I am running a date command using the Distributed Shell example in a loop of 500 times. It ran successfully all the times except one time where it gave the following error. 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client (Client.java:monitorApplication(605)) - Got application report from ASM for, appId=222, clientToken=null, appDiagnostics=Application application_1363938200742_0222 failed 1 times due to AM Container for appattempt_1363938200742_0222_01 exited with exitCode: 143 due to: Container [pid=21141,containerID=container_1363938200742_0222_01_01] is running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing container. Dump of the process-tree for container_1363938200742_0222_01_01 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date 1/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stdout 2/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_01/AppMaster.stderr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-526) [Umbrella] Improve test coverage in YARN
[ https://issues.apache.org/jira/browse/YARN-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-526. -- Resolution: Fixed Assignee: Andrey Klochkov Assigning the umbrella also to [~aklochkov] and closing it as fixed as all sub-tasks currently present are done. [Umbrella] Improve test coverage in YARN Key: YARN-526 URL: https://issues.apache.org/jira/browse/YARN-526 Project: Hadoop YARN Issue Type: Task Reporter: Vinod Kumar Vavilapalli Assignee: Andrey Klochkov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-526) [Umbrella] Improve test coverage in YARN
[ https://issues.apache.org/jira/browse/YARN-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525607#comment-14525607 ] Vinod Kumar Vavilapalli commented on YARN-526: -- And no fix-versions as sub-tasks spanned releases. [Umbrella] Improve test coverage in YARN Key: YARN-526 URL: https://issues.apache.org/jira/browse/YARN-526 Project: Hadoop YARN Issue Type: Task Reporter: Vinod Kumar Vavilapalli Assignee: Andrey Klochkov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-543) [Umbrella] NodeManager localization related issues
[ https://issues.apache.org/jira/browse/YARN-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-543. -- Resolution: Fixed Resolving this very old umbrella JIRA. Most of the originally identified issues are resolved. And as new bugs come in, we can open new tickets. - Will leave the open sub-tasks as they are. - No fix-version as this was done across releases. [Umbrella] NodeManager localization related issues -- Key: YARN-543 URL: https://issues.apache.org/jira/browse/YARN-543 Project: Hadoop YARN Issue Type: Task Components: nodemanager Reporter: Vinod Kumar Vavilapalli Seeing a bunch of localization related issues being worked on, this is the tracking ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-128) [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery
[ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-128: - Summary: [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery (was: RM Restart) [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery - Key: YARN-128 URL: https://issues.apache.org/jira/browse/YARN-128 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Arun C Murthy Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, RMRestartPhase1.pdf, YARN-128.full-code-4.patch, YARN-128.full-code.3.patch, YARN-128.full-code.5.patch, YARN-128.new-code-added-4.patch, YARN-128.new-code-added.3.patch, YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch, YARN-128.patch, restart-12-11-zkstore.patch, restart-fs-store-11-17.patch, restart-zk-store-11-17.patch This umbrella jira tracks the work needed to preserve critical state information and reload them upon RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)