[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394200#comment-14394200 ] Hadoop QA commented on YARN-3443: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709194/YARN-3443.002.patch against trunk revision 72f6bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1150 javac compiler warnings (more than the trunk's current 1148 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7211//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7211//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7211//console This message is automatically generated. Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM - Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3443.001.patch, YARN-3443.002.patch The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3443: Attachment: YARN-3443.002.patch Reattaching the patch with the fixed findbug warning fixed. Not sure what to make of the javac warnings here, however : https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/diffJavacWarnings.txt Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM - Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3443.001.patch, YARN-3443.002.patch The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
zhihai xu created YARN-3446: --- Summary: FairScheduler HeadRoom calculation should exclude nodes in the blacklist. Key: YARN-3446 URL: https://issues.apache.org/jira/browse/YARN-3446 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu FairScheduler HeadRoom calculation should exclude nodes in the blacklist. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes. This makes jobs to hang forever(ResourceManager does not assign any new containers on blacklisted nodes but availableResource AM get from RM includes blacklisted nodes available resource). This issue is similar as YARN-1680 which is for Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394249#comment-14394249 ] Rohith commented on YARN-3410: -- Attached the initial patch for removing individual applications from state store. YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Attachments: 0001-YARN-3410-v1.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394248#comment-14394248 ] Rohith commented on YARN-3410: -- bq. what's the use case of using rmadmin removing a state while RM is running? Practically rmadmin need not to remove rm state store while RM running. I was thinking like if any exception happens during recovery like YARN-2340, then RM never get exited. RM keeps on switcing to standby and trying to become active. In this case, admin can format state store without stopping RM. bq. it's better that RM can log all errors of applications recovering before die. With this, admin can know which application states caused RM die. I think this will be hard to get which application caused the problem ICO RuntimeExceptions. Admin need to back track the exception in the logs to identify it. YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Attachments: 0001-YARN-3410-v1.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3410: - Attachment: 0001-YARN-3410-v1.patch YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Attachments: 0001-YARN-3410-v1.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394305#comment-14394305 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #152 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/152/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394307#comment-14394307 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #152 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/152/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394313#comment-14394313 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #152 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/152/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394149#comment-14394149 ] Naganarasimha G R commented on YARN-2729: - bq. Revisted interval, I think it's better to make it to be provider configuration instead of script-provider-only configuration. Since config/script will share it (I remember I have some back-and-forth opinions here). :) agree, i dont mind redoing, as long as its for better reason and i was expecting for changes here anyway. For other comments on configuration will get it done, bq. I feel like ScriptBased and ConfigBased can share some implementations, they will all init a time task, get interval and run, check timeout (meaningless for config-based), etc. Can you make an abstract class and inherited by ScriptBased? I can do this (which i feel is correct), but if we do this then it might not be possible to generalize much NodeHealthSCriptRunner and ScriptBasedNodeLabelsProvider, which i feel should be ok bq. checkAndThrowLabelName should be called in NodeStatusUpdaterImpl In a way it would be better in NodeStatusUpdaterImpl as we support external class to be a provider, but earlier thought it would not be good for additional checks as part of heart beat flow bq. label need to be trim() when called checkAndThrowLabelName(...) Not required as checkAndThrowLabelName takes care of it, but missing test case will add it for NodeStatusUpdaterImpl Other issues will rework in next patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, YARN-2729.20150402-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394327#comment-14394327 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Yarn-trunk #886 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/886/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394329#comment-14394329 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Yarn-trunk #886 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/886/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394335#comment-14394335 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Yarn-trunk #886 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/886/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394546#comment-14394546 ] Do Hoai Nam commented on YARN-2140: --- For the case of ingress traffic you can check our solution in YARN-2618 (Support bandwidth enforcement for containers while reading from HDFS) https://issues.apache.org/jira/browse/YARN-2681 and the related paper (http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf) Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan Attachments: NetworkAsAResourceDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394507#comment-14394507 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #143 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/143/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394499#comment-14394499 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #143 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/143/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394501#comment-14394501 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #143 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/143/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394522#comment-14394522 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2084 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2084/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394514#comment-14394514 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2084 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2084/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-yarn-project/CHANGES.txt Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394516#comment-14394516 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2084 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2084/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3444) Fixed typo (capability)
[ https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3444: --- Attachment: YARN-3444.patch Fixed typo (capability) --- Key: YARN-3444 URL: https://issues.apache.org/jira/browse/YARN-3444 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Gabor Liptak Priority: Minor Attachments: YARN-3444.patch Fixed typo (capability) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-1680: Component/s: capacityscheduler availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394822#comment-14394822 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #153 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/153/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394816#comment-14394816 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #153 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/153/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394814#comment-14394814 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #153 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/153/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3444) Fixed typo (capability)
[ https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3444: --- Target Version/s: 2.6.1 Fixed typo (capability) --- Key: YARN-3444 URL: https://issues.apache.org/jira/browse/YARN-3444 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Gabor Liptak Priority: Minor Attachments: YARN-3444.patch Fixed typo (capability) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2140: -- Assignee: Sidharta Seethana (was: Wei Yan) Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Sidharta Seethana Attachments: NetworkAsAResourceDesign.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: ATSv2BackendHBaseSchemaproposal.pdf Attaching the schema proposal for storing ATS information in hbase. I also have example queries listed and a basic UI design explanation. Feedback is welcome! [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-81: --- Attachment: YARN-81.patch Upload a patch to fix most of warnings. Some Unused declared dependencies found warnings still there because maven failed to detect the usage of dependencies and removing them will cause compile/test failure. Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom White Assignee: Junping Du Attachments: YARN-81.patch This is the equivalent of HADOOP-8278 for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394748#comment-14394748 ] zhihai xu commented on YARN-2666: - thanks [~ozawa]! TestFairScheduler.testContinuousScheduling fails Intermittently --- Key: YARN-2666 URL: https://issues.apache.org/jira/browse/YARN-2666 Project: Hadoop YARN Issue Type: Test Components: scheduler Reporter: Tsuyoshi Ozawa Assignee: zhihai xu Attachments: YARN-2666.000.patch The test fails on trunk. {code} Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.582 sec FAILURE! java.lang.AssertionError: expected:2 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394869#comment-14394869 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2102/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Fix For: 2.8.0 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394876#comment-14394876 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2102/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue -- Key: YARN-3415 URL: https://issues.apache.org/jira/browse/YARN-3415 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, YARN-3415.002.patch We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster. I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289 In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource. cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-81: --- Attachment: YARN-81-v2.patch Fix minor format issue in v2 patch. Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom White Assignee: Junping Du Attachments: YARN-81-v2.patch, YARN-81.patch This is the equivalent of HADOOP-8278 for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-81: -- Assignee: Junping Du Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom White Assignee: Junping Du This is the equivalent of HADOOP-8278 for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394619#comment-14394619 ] Junping Du commented on YARN-81: It is open for a long time. Assign to myself to work on it. Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom White This is the equivalent of HADOOP-8278 for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3447) Dodgy code Warnings in org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender
[ https://issues.apache.org/jira/browse/YARN-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula moved MAPREDUCE-6306 to YARN-3447: --- Key: YARN-3447 (was: MAPREDUCE-6306) Project: Hadoop YARN (was: Hadoop Map/Reduce) Dodgy code Warnings in org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender Key: YARN-3447 URL: https://issues.apache.org/jira/browse/YARN-3447 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula *Dodgy code Warnings* UrF Unread public/protected field: org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.count Bug type URF_UNREAD_PUBLIC_OR_PROTECTED_FIELD (click for details) In class org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element Field org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.count At Log4jWarningErrorMetricsAppender.java:[line 44] UrF Unread public/protected field: org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.timestampSeconds Bug type URF_UNREAD_PUBLIC_OR_PROTECTED_FIELD (click for details) In class org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element Field org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.timestampSeconds At Log4jWarningErrorMetricsAppender.java:[line 45] Please find more details here... https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5371//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394867#comment-14394867 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2102/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java Add errors and warning metrics page to RM, NM web UI Key: YARN-2901 URL: https://issues.apache.org/jira/browse/YARN-2901 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch It would be really useful to have statistics on the number of errors and warnings in the RM and NM web UI. I'm thinking about - 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 hours/day By errors and warnings I'm referring to the log level. I suspect we can probably achieve this by writing a custom appender?(I'm open to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394891#comment-14394891 ] Hadoop QA commented on YARN-2003: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708208/0005-YARN-2003.patch against trunk revision db80e42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7213//console This message is automatically generated. Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side] -- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0005-YARN-2004.patch Uploading CS changes. Hi [~leftnoteasy] YARN-2004 need to have some changes in CS and LeafQueue. But same methods dummy impl is added in YARN-2003. Hence it will be dependable with YARN-2003, but not opposite. Kindly share your opinion. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-329) yarn CHANGES.txt link missing from docs Reference
[ https://issues.apache.org/jira/browse/YARN-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-329. - Resolution: Fixed Fix Version/s: 2.6.0 This get fixed since 2.6.0 release. Mark it as resolved. yarn CHANGES.txt link missing from docs Reference - Key: YARN-329 URL: https://issues.apache.org/jira/browse/YARN-329 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Thomas Graves Priority: Minor Fix For: 2.6.0 Looking at the hadoop 0.23 docs: http://hadoop.apache.org/docs/r0.23.5/ There is no link to the yarn CHANGES.txt in the Reference menu on the left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-329) yarn CHANGES.txt link missing from docs Reference
[ https://issues.apache.org/jira/browse/YARN-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-329: -- Fix Version/s: (was: 2.6.0) yarn CHANGES.txt link missing from docs Reference - Key: YARN-329 URL: https://issues.apache.org/jira/browse/YARN-329 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Thomas Graves Priority: Minor Looking at the hadoop 0.23 docs: http://hadoop.apache.org/docs/r0.23.5/ There is no link to the yarn CHANGES.txt in the Reference menu on the left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2969) allocate resource on different nodes for task
[ https://issues.apache.org/jira/browse/YARN-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-2969. -- Resolution: Duplicate allocate resource on different nodes for task - Key: YARN-2969 URL: https://issues.apache.org/jira/browse/YARN-2969 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Yang Hao At the help of slider, YARN will be a common resource managing OS and some application would like to apply container( or component on slider) on different nodes, so a configuration of allocating resource on different will be helpful -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395043#comment-14395043 ] Hadoop QA commented on YARN-81: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709287/YARN-81.patch against trunk revision db80e42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7214//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7214//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7214//console This message is automatically generated. Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom White
[jira] [Commented] (YARN-81) Make sure YARN declares correct set of dependencies
[ https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395056#comment-14395056 ] Hadoop QA commented on YARN-81: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709290/YARN-81-v2.patch against trunk revision db80e42. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7215//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7215//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7215//console This message is automatically generated. Make sure YARN declares correct set of dependencies --- Key: YARN-81 URL: https://issues.apache.org/jira/browse/YARN-81 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Tom
[jira] [Resolved] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication
[ https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-374. - Resolution: Not a Problem Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication -- Key: YARN-374 URL: https://issues.apache.org/jira/browse/YARN-374 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nemon Lou After i kill a app by typing bin/yarn rmadmin app -kill APP_ID, no job info is kept on JHS web page. However, when i kill a job by typing bin/mapred job -kill JOB_ID , i can see a killed job left on JHS. Some hive users are confused by that their jobs been killed but nothing left on JHS ,and killed app's info on RM web page is not enough.(They kill job by clientRMProtocol) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication
[ https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395158#comment-14395158 ] Junping Du commented on YARN-374: - Given we already have generic history server (now is timeline server) which will track YARN applications that get killed, I will resolve this issue. Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication -- Key: YARN-374 URL: https://issues.apache.org/jira/browse/YARN-374 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nemon Lou After i kill a app by typing bin/yarn rmadmin app -kill APP_ID, no job info is kept on JHS web page. However, when i kill a job by typing bin/mapred job -kill JOB_ID , i can see a killed job left on JHS. Some hive users are confused by that their jobs been killed but nothing left on JHS ,and killed app's info on RM web page is not enough.(They kill job by clientRMProtocol) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2969) allocate resource on different nodes for task
[ https://issues.apache.org/jira/browse/YARN-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395175#comment-14395175 ] Junping Du commented on YARN-2969: -- Duplicated with YARN-1042: add ability to specify affinity/anti-affinity in container requests. allocate resource on different nodes for task - Key: YARN-2969 URL: https://issues.apache.org/jira/browse/YARN-2969 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Yang Hao At the help of slider, YARN will be a common resource managing OS and some application would like to apply container( or component on slider) on different nodes, so a configuration of allocating resource on different will be helpful -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-197) Add a separate log server
[ https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394939#comment-14394939 ] Junping Du commented on YARN-197: - Hi [~seth.siddha...@gmail.com], given we already have generic Application History Serve (deprecated) and timeline server (v1) is there and timeline service v2 is in development, it sounds no necessary to have a separate log server now. Can we close it? Add a separate log server - Key: YARN-197 URL: https://issues.apache.org/jira/browse/YARN-197 Project: Hadoop YARN Issue Type: New Feature Reporter: Siddharth Seth Currently, the job history server is being used for log serving. A separate log server can be added which can deal with serving logs, along with other functionality like log retention, merging, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-329) yarn CHANGES.txt link missing from docs Reference
[ https://issues.apache.org/jira/browse/YARN-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395063#comment-14395063 ] Allen Wittenauer commented on YARN-329: --- Removing the fix version because we need an actual patch to point to... yarn CHANGES.txt link missing from docs Reference - Key: YARN-329 URL: https://issues.apache.org/jira/browse/YARN-329 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Thomas Graves Priority: Minor Looking at the hadoop 0.23 docs: http://hadoop.apache.org/docs/r0.23.5/ There is no link to the yarn CHANGES.txt in the Reference menu on the left side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-463) Show explicitly excluded nodes on the UI
[ https://issues.apache.org/jira/browse/YARN-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-463. - Resolution: Implemented We already have decommission nodes in UI page, so resolve this JIRA. Show explicitly excluded nodes on the UI Key: YARN-463 URL: https://issues.apache.org/jira/browse/YARN-463 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Labels: usability Nodes can be explicitly excluded via the config yarn.resourcemanager.nodes.exclude-path. We should have a way of displaying this list via web and command line UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Summary: Implement a FairOrderingPolicy (was: Implement a Fair SchedulerOrderingPolicy) Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Summary: Create Initial OrderingPolicy Framework and FifoOrderingPolicy (was: Create Initial OrderingPolicy Framework) Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Description: Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy (was: Create the initial framework required for using OrderingPolicies) Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395508#comment-14395508 ] Craig Welch commented on YARN-3318: --- [~vinodkv] bq. ...We can strictly focus on the policy framework here... Sure, limited patch to framework bq. ...You could also say SchedulableProcess... SchedulableProcess it is, done bq. I agree to this, but we are not in a position to support the APIs, CLI, config names in a supportable manner yet. They may or may not change depending on how parent queue policies, limit policies evolve. For that reason alone, I am saying that (1) Don't make the configurations public yet, or put a warning saying that they are unstable and (2) don't expose them in CLI , REST APIs yet. It's okay to put in the web UI, web UI scraping is not a contract. You can't see it, because it's part of Capacity Scheduler Integration, but removed CLI and proto related change. There was no rest api change, the web UI change is still present. Will warn unstable when added to config files in the scheduler integration patch bq. SchedulerApplicationAttempt.getDemand() should be private Done bq. updateCaches() - updateState() / updateSchedulingState() as that is what it is doing? getCachedConsumption() / getCachedDemand(): simply getCurrent*() ? What is the need for reorderOnContainerAllocate () / reorderOnContainerRelease()? Is now getSchedulingConsumption(); getSchedulingDemand(); updateSchedulingState(); This is needed because mutable values which are used for ordering cannot be allowed to change for an item in the tree, else it will not be found in some cases during the delete before reinsert process which occurs when a schedulable's mutable values used in comparison change (for fairness, changes to consumption and potentially demand) Not all OrderingPolicies require reordering on these events, for efficiency they get to decide if they do or not, hence the reorderOn. The reorderOn are now reorderForContainerAllocation reorderForContainerRelease bq. Move all the comparator related classed into their own package No longer needed as comparators are now just a property of policies, see below for details bq. This is really a ComparatorBasedOrderingPolicy. Do we really see non-comparator based ordering-policy. We are unnecessarily adding two abstractions - adding policies and comparators Originally, there was a perceived need to be able to support a more flexible interface than the comparator one, but also a desire to build up a simpler, composible abstraction to be used with an instance of the former which had most of the hard stuff done. Given that all of the policies we've contemplated building fit the latter abstraction and the level of flexibility does not appear to actually be that different, I think it's fair to say that we only need what was previously the SchedulerComparator abstraction as a plugin-point. Given that, a slightly refactored version of the SchedulerComparator abstraction is now the only plugin point and is now what goes by the name of OrderingPolicy. What was previously the OrderingPolicy is now a single concrete class implementing the surrounding logic, meant to be usable from any scheduler, named SchedulingOrder. So, one abstraction, a comparator-based ordering-policy. If we really do find we need a flexibility we don't have some day, the SchedulingOrder class could be abstracted to provide that higher level abstraction - but as we see no need for it now, and it appears probably never will, there's no reason to do so at present bq. ...Use className.getName()... Done [~leftnoteasy] bq. ...I prefer what Vinod suggested, split SchedulerProcess to be QueueSchedulable and AppSchedulable ... I don't see that he has suggested that. In any case, with the removal of *Serial* and the move to compareInputOrderTo() I don't at present see a need to have separate subtypes for app and queue to avoid dangling properties. And, I think if we do it right we won't end up introducing them. By splitting in the suggested way we commit ourselves to either multiple comparators (to use the differing functionality) or awkward testing of subtype/etc logic in one comparator - so it basically moves the complexity/awkwardness, it doesn't eliminate it. I've refactored such that the Policy now provides a Comparator as opposed to extending it, so there is now room for it to provide multiple comparators and handle subtypes if need be, but I think we should wait until we see that we must do that before doing so, as I don't believe we will end up needing to (but if we do, existing code should need little change, and implementing what you suggest should be essentially additive...) bq. ...About inherit relationships between interfaces/classes... Policies will be composed to achieve combined capabilities yet the collection of
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395412#comment-14395412 ] zhihai xu commented on YARN-2893: - Hi [~jira.shegalov], I can catch the exception for all the code. try { Credentials credentials = parseCredentials(submissionContext); if (UserGroupInformation.isSecurityEnabled()) { this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId, credentials, submissionContext.getCancelTokensWhenComplete(), application.getUser()) } else { this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppEvent(applicationId, RMAppEventType.START)); } } catch (Exception e) { LOG.warn(Unable to parse credentials., e); // Sending APP_REJECTED is fine, since we assume that the // RMApp is in NEW state and thus we haven't yet informed the // scheduler about the existence of the application assert application.getState() == RMAppState.NEW; this.rmContext.getDispatcher().getEventHandler() .handle(new RMAppRejectedEvent(applicationId, e.getMessage())); throw RPCUtil.getRemoteException(e); } {code} Are you ok with above change? I think it will be better to parseCredentials and catch the exception for Security not Enabled case, So we can find corrupted credentials from Client earlier. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-505) NPE at AsyncDispatcher$GenericEventHandler
[ https://issues.apache.org/jira/browse/YARN-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-505. - Resolution: Won't Fix NPE at AsyncDispatcher$GenericEventHandler -- Key: YARN-505 URL: https://issues.apache.org/jira/browse/YARN-505 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0 Reporter: Przemyslaw Pretki Priority: Minor Steps to reproduce: {code} @Test public void testAsyncDispatcher() { AsyncDispatcher dispatcher = new AsyncDispatcher(); EventHandler handler = dispatcher.getEventHandler(); handler.handle(null); } {code} Moreover, event taken from *BlockingQueue* will never be *null*, so that it seems that the following condition is not necessary (AsyncDispatcher.createThread() method): {code} if (event != null) { dispatch(event); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-551) The option shell_command of DistributedShell had better support compound command
[ https://issues.apache.org/jira/browse/YARN-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395475#comment-14395475 ] Junping Du commented on YARN-551: - DistributedShell support option of --shell_args to put extra args after shell command. Resolve this JIRA as not a problem. The option shell_command of DistributedShell had better support compound command - Key: YARN-551 URL: https://issues.apache.org/jira/browse/YARN-551 Project: Hadoop YARN Issue Type: Improvement Reporter: rainy Yu The option shell_command of DistributedShell must be such single command as 'ls', not be compound command such as 'ps -ef' that including blank character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-551) The option shell_command of DistributedShell had better support compound command
[ https://issues.apache.org/jira/browse/YARN-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-551. - Resolution: Not a Problem The option shell_command of DistributedShell had better support compound command - Key: YARN-551 URL: https://issues.apache.org/jira/browse/YARN-551 Project: Hadoop YARN Issue Type: Improvement Reporter: rainy Yu The option shell_command of DistributedShell must be such single command as 'ls', not be compound command such as 'ps -ef' that including blank character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-375) FIFO scheduler may crash due to bugg app
[ https://issues.apache.org/jira/browse/YARN-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395503#comment-14395503 ] Junping Du commented on YARN-375: - This won't happen because after AM sending resource requests to RM (via ApplicationMasterProtocol) in allocate(AllocateRequest request), the RM will do sanity check against it which include checking memory 0. Related code pieces: In ApplicationMasterService.java, {code} RMServerUtils.validateResourceRequests(ask, rScheduler.getMaximumResourceCapability()); {code} In RMServerUtils.java, {code} public static void validateResourceRequest(ResourceRequest resReq, Resource maximumResource) throws InvalidResourceRequestException { if (resReq.getCapability().getMemory() 0 || resReq.getCapability().getMemory() maximumResource.getMemory()) { throw new InvalidResourceRequestException(Invalid resource request + , requested memory 0 + , or requested memory max configured + , requestedMemory= + resReq.getCapability().getMemory() + , maxMemory= + maximumResource.getMemory()); } ... {code} Will resolve this JIRA as not a problem. FIFO scheduler may crash due to bugg app -- Key: YARN-375 URL: https://issues.apache.org/jira/browse/YARN-375 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Arun C Murthy Priority: Critical The following code should check for a 0 return value rather than crash! {code} int availableContainers = node.getAvailableResource().getMemory() / capability.getMemory(); // TODO: A buggy // application // with this // zero would // crash the // scheduler. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-375) FIFO scheduler may crash due to bugg app
[ https://issues.apache.org/jira/browse/YARN-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-375. - Resolution: Not a Problem FIFO scheduler may crash due to bugg app -- Key: YARN-375 URL: https://issues.apache.org/jira/browse/YARN-375 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Arun C Murthy Priority: Critical The following code should check for a 0 return value rather than crash! {code} int availableContainers = node.getAvailableResource().getMemory() / capability.getMemory(); // TODO: A buggy // application // with this // zero would // crash the // scheduler. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Description: For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. was:For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. A write lock is held during the entire deletion phase which in practice can be hours. An alternative is to create a rolling set of databases that age out and can be efficiently removed via a recursive directory delete. The removes the lock in the deletion thread and clients and servers can share access to the underlying database which already implements its only internal locking mechanism. Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned YARN-3448: - Assignee: Jonathan Eagles Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Summary: Create Initial OrderingPolicy Framework (was: Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior) Create Initial OrderingPolicy Framework --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Description: Create the initial framework required for using OrderingPolicies (was: Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior.) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.45.patch Create Initial OrderingPolicy Framework --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-505) NPE at AsyncDispatcher$GenericEventHandler
[ https://issues.apache.org/jira/browse/YARN-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395456#comment-14395456 ] Junping Du commented on YARN-505: - We should never handle null event for AsyncDispatcher, and this is a basic assumption for every call of handle(event) on AsyncDispatcher in YARN. We do have such a practice that: we don't check null if this object is not supposed to be null. If it (become null) do happen in some situations, then we fix these situations because that is not we expect. In this case, NPE is a warn for us that something unexpected happens. I will resolve this JIRA as won't fix. NPE at AsyncDispatcher$GenericEventHandler -- Key: YARN-505 URL: https://issues.apache.org/jira/browse/YARN-505 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0 Reporter: Przemyslaw Pretki Priority: Minor Steps to reproduce: {code} @Test public void testAsyncDispatcher() { AsyncDispatcher dispatcher = new AsyncDispatcher(); EventHandler handler = dispatcher.getEventHandler(); handler.handle(null); } {code} Moreover, event taken from *BlockingQueue* will never be *null*, so that it seems that the following condition is not necessary (AsyncDispatcher.createThread() method): {code} if (event != null) { dispatch(event); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null
[ https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3435: --- Component/s: resourcemanager AM container to be allocated Appattempt AM container shown as null -- Key: YARN-3435 URL: https://issues.apache.org/jira/browse/YARN-3435 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Environment: 1RM,1DN Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Attachments: Screenshot.png, YARN-3435.001.patch Submit yarn application Open http://rm:8088/cluster/appattempt/appattempt_1427984982805_0003_01 Before the AM container is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3436: --- Component/s: resourcemanager documentation Doc WebServicesIntro.html Example Rest API url wrong Key: YARN-3436 URL: https://issues.apache.org/jira/browse/YARN-3436 Project: Hadoop YARN Issue Type: Bug Components: documentation, resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3436.001.patch /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html {quote} Response Examples JSON response with single resource HTTP Request: GET http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 Response Status Line: HTTP/1.1 200 OK {quote} Url should be ws/v1/cluster/{color:red}apps{color} . 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500
[ https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-800. - Resolution: Duplicate Clicking on an AM link for a running app leads to a HTTP 500 Key: YARN-800 URL: https://issues.apache.org/jira/browse/YARN-800 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Priority: Minor Clicking the AM link tries to open up a page with url like http://hostname:8088/proxy/application_1370886527995_0645/ and this leads to an HTTP 500 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395518#comment-14395518 ] Hadoop QA commented on YARN-3318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709373/YARN-3318.47.patch against trunk revision ef591b1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1148 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7217//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7217//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7217//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7217//console This message is automatically generated. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395255#comment-14395255 ] Gera Shegalov commented on YARN-2893: - Thanks [~zxu] for the patch, and apologies for the delay. I skimmed over the patch, and it looks good overall. Can you keep your logic in {{RMAppManager#submitApplicationmove}} with parseCredentials but put it back under {{if (UserGroupInformation.isSecurityEnabled()) {}} AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.45.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-520) webservices API ws/v1/cluster/nodes doesn't return LOST nodes
[ https://issues.apache.org/jira/browse/YARN-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-520. - Resolution: Duplicate webservices API ws/v1/cluster/nodes doesn't return LOST nodes - Key: YARN-520 URL: https://issues.apache.org/jira/browse/YARN-520 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.6 Reporter: Nathan Roberts webservices API ws/v1/cluster/nodes doesn't return LOST nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-520) webservices API ws/v1/cluster/nodes doesn't return LOST nodes
[ https://issues.apache.org/jira/browse/YARN-520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395465#comment-14395465 ] Junping Du commented on YARN-520: - This is already addressed and resolved in YARN-642. Marked it as duplicated. webservices API ws/v1/cluster/nodes doesn't return LOST nodes - Key: YARN-520 URL: https://issues.apache.org/jira/browse/YARN-520 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.6 Reporter: Nathan Roberts webservices API ws/v1/cluster/nodes doesn't return LOST nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Description: Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison was: Implement a Fair Comparator for the Scheduler Comparator Ordering Policy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment An implementation of a Scheduler Comparator for use with the Scheduler Comparator Ordering Policy will be built with the below comparison for ordering applications for container assignment (ascending) and for preemption (descending) Current resource usage - less usage is lesser Submission time - earlier is lesser Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application name, which is lexically FIFO for that comparison (first submitted is lesser) Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-402) Dispatcher warn message is too late
[ https://issues.apache.org/jira/browse/YARN-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395402#comment-14395402 ] Junping Du commented on YARN-402: - Thanks [~lohit] for reporting this issue. I think it could be a little too allergic to give a warn when half full of the queue. By default, the size of LinkedBlockingQueue is: Interger.MAX_VALUE which is 2^31-1. Half full means: still ~2^30 available for use so it could be too early. Do we want a configurable value here? I think it could be a little overkill. If so, we may need to pick up a more reasonable fixed value here. IMO, rmDispatcher could be the most busy AsynDispatcher in YARN today, RMNodeEvent, SchedulerEvent, RMAppEvent, RMAppAttemptEvent, NodeListManagerEvent, AMLauncherEvent, etc. are all get broadcasted on this single dispatcher. Within these events, SchedulerEvent seems to be the most active events: let's assume thousands of nodes events and thousands of application attempt events generated in 1 second (default heartbeat interval for NM-RM heartbeat and AMRMClientAsync heartbeat to RM) in large cluster, then we assume 10*1000 scheduler events could happens on rmDispatcher, then we can estimate up to 10*(10*1000) events (include other events than SchedulerEvent) could happens per second there. Based on this assumption, if we want to warn ahead of 10 seconds before queue get full (assume peek operations get slow), so may be 10 (seconds) * 10 (event types on rmScheduler) * (10*1000) (scale of Nodes and Apps / interval) sounds like a reasonable value here? In addition, I think we should fix tiny issue in below code (qSize % 1000 == 0) doesn't make sense as qSize default to be 2^32 -1: {code} int qSize = eventQueue.size(); if (qSize !=0 qSize %1000 == 0) { LOG.info(Size of event-queue is + qSize); } int remCapacity = eventQueue.remainingCapacity(); if (remCapacity 1000) { LOG.warn(Very low remaining capacity in the event-queue: + remCapacity); } {code} Dispatcher warn message is too late --- Key: YARN-402 URL: https://issues.apache.org/jira/browse/YARN-402 Project: Hadoop YARN Issue Type: Improvement Reporter: Lohit Vijayarenu Priority: Minor AsyncDispatcher throws out Warn when capacity remaining is less than 1000 {noformat} if (remCapacity 1000) { LOG.warn(Very low remaining capacity in the event-queue: + remCapacity); } {noformat} What would be useful is to warn much before that, may be half full instead of when queue is completely full. I see that eventQueue capacity is int value. So, if one warn's queue has only 1000 capacity left, then service definitely has serious problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395427#comment-14395427 ] Hadoop QA commented on YARN-3318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709343/YARN-3318.45.patch against trunk revision 023133c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1148 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7216//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7216//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7216//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7216//console This message is automatically generated. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart
Junping Du created YARN-3449: Summary: Recover appTokenKeepAliveMap upon nodemanager restart Key: YARN-3449 URL: https://issues.apache.org/jira/browse/YARN-3449 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0, 2.7.0 Reporter: Junping Du Assignee: Junping Du appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application alive after application is finished but NM still need app token to do log aggregation (when enable security and log aggregation). The applications are only inserted into this map when receiving getApplicationsToCleanup() from RM heartbeat response. And RM only send this info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM restart work preserving should put appTokenKeepAliveMap into NMStateStore and get recovered after restart. Without doing this, RM could terminate application earlier, so log aggregation could be failed if security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.47.patch Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.47.patch Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
Jonathan Eagles created YARN-3448: - Summary: Add Rolling Time To Lives Level DB Plugin Capabilities Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. A write lock is held during the entire deletion phase which in practice can be hours. An alternative is to create a rolling set of databases that age out and can be efficiently removed via a recursive directory delete. The removes the lock in the deletion thread and clients and servers can share access to the underlying database which already implements its only internal locking mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-402) Dispatcher warn message is too late
[ https://issues.apache.org/jira/browse/YARN-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395408#comment-14395408 ] Junping Du commented on YARN-402: - Forget that the queue can be constructed with other queue type other than LinkedBlockingQueue. So may be it could be the smaller value within half of queue size (if queue is not LinkedBlockingQueue by default) and 1000* 1000 (estimated as above). Dispatcher warn message is too late --- Key: YARN-402 URL: https://issues.apache.org/jira/browse/YARN-402 Project: Hadoop YARN Issue Type: Improvement Reporter: Lohit Vijayarenu Priority: Minor AsyncDispatcher throws out Warn when capacity remaining is less than 1000 {noformat} if (remCapacity 1000) { LOG.warn(Very low remaining capacity in the event-queue: + remCapacity); } {noformat} What would be useful is to warn much before that, may be half full instead of when queue is completely full. I see that eventQueue capacity is int value. So, if one warn's queue has only 1000 capacity left, then service definitely has serious problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395409#comment-14395409 ] zhihai xu commented on YARN-2893: - [~jira.shegalov], thanks for the review. I can put back catching the exception for {{if (UserGroupInformation.isSecurityEnabled()) {}}. I will keep the change to parseCredentials for Security not Enabled case, So we can reject an application with corrupted credentials for none-secure one. Are you ok with it? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.1.patch Add Rolling Time To Lives Level DB Plugin Capabilities -- Key: YARN-3448 URL: https://issues.apache.org/jira/browse/YARN-3448 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3448.1.patch For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging. Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster. Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.48.patch javac error looks bogus, existing error has simply moved findbugs looks bogus, class it's complaining about is static. uploading new version so see if it notices now TestFairScheduler passes on my box with the patch, and can't see any way it would be effected. Tests will rerun with new patch, so we'll see. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395566#comment-14395566 ] Hadoop QA commented on YARN-3318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709391/YARN-3318.48.patch against trunk revision ef591b1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1148 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7218//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7218//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7218//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7218//console This message is automatically generated. Create Initial OrderingPolicy Framework and FifoOrderingPolicy -- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, YARN-3318.47.patch, YARN-3318.48.patch Create the initial framework required for using OrderingPolicies and an initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394437#comment-14394437 ] Junping Du commented on YARN-3437: -- Thanks [~sjlee0] for delivering a patch here! Just quickly go through the patch, looks like we are generating one app collector per map task. I think this is good for scalability test on backend storage which can be a bottleneck in mainstream cases. In addition, do we want to address some extreme cases, e.g. a huge applications will have hundreds of thousands or even millions tasks? If so, then may be we want to know a single app collector's bottleneck as well for accepting/forwarding messages from hundreds of thousands maps. Also, in a real cluster, the mapping from cluster to app, and app to tasks are all 1-N mapping. May be making app aggregator number configurable (just like map task number, and byte per map, etc.) is something we can do for next step? BTW, it has some duplicated code with YARN-2556 (like TimelineServerPerformance.java). Looks like YARN-2556 is in pretty good shape and possible to go to trunk and branch-2 quickly. I would remind to keep watching that JIRA status and do necessary rebase work if that patch go in and we may want to merge it into YARN-2928 branch soon. convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3444) Fixed typo (capability)
[ https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394583#comment-14394583 ] Hadoop QA commented on YARN-3444: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709238/YARN-3444.patch against trunk revision 72f6bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7212//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7212//console This message is automatically generated. Fixed typo (capability) --- Key: YARN-3444 URL: https://issues.apache.org/jira/browse/YARN-3444 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Gabor Liptak Priority: Minor Attachments: YARN-3444.patch Fixed typo (capability) -- This message was sent by Atlassian JIRA (v6.3.4#6332)