[jira] [Updated] (YARN-2161) Fix build on macosx: YARN parts
[ https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated YARN-2161: Attachment: YARN-2161.v1.patch Fix build on macosx: YARN parts --- Key: YARN-2161 URL: https://issues.apache.org/jira/browse/YARN-2161 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2161.v1.patch When compiling on macosx with -Pnative, there are several warning and errors, fix this would help hadoop developers with macosx env. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts
[ https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032164#comment-14032164 ] Binglin Chang commented on YARN-2161: - Changes: container-executor.c: 1. make mkdirs more compatible, remove usage of mkdirat/openat 2. use sysconf() to get LOGIN_NAME_MAX 3. macosx doesn't have fcloseall, so close all opened fds on macosx 4. disable cgroup on macosx test-container-executor.c: 1. macosx do not have user bin, skip a check 2. change /etc/passwd(not exists on mac) to /bin/ls Fix build on macosx: YARN parts --- Key: YARN-2161 URL: https://issues.apache.org/jira/browse/YARN-2161 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2161.v1.patch When compiling on macosx with -Pnative, there are several warning and errors, fix this would help hadoop developers with macosx env. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032167#comment-14032167 ] Hadoop QA commented on YARN-2032: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644451/YARN-2032-branch-2-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3992//console This message is automatically generated. Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2032-branch-2-1.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anders updated YARN-2142: - Attachment: trust.patch This is patch file is base on the version 2.2.0; On my computer it can work, if you have any question ,please tell me. On the webUI ,the function was not completed(but it seems work well). Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: trust.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's healthcheckservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status
[ https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032173#comment-14032173 ] Hadoop QA commented on YARN-2142: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650522/trust.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3993//console This message is automatically generated. Add one service to check the nodes' TRUST status - Key: YARN-2142 URL: https://issues.apache.org/jira/browse/YARN-2142 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager, scheduler Affects Versions: 2.2.0 Environment: OS:Ubuntu 13.04; JAVA:OpenJDK 7u51-2.4.4-0 Reporter: anders Priority: Minor Labels: patch Fix For: 2.2.0 Attachments: trust.patch Original Estimate: 1m Remaining Estimate: 1m Because of critical computing environment ,we must test every node's TRUST status in the cluster (We can get the TRUST status by the API of OAT sever),So I add this feature into hadoop's schedule . By the TRUST check service ,node can get the TRUST status of itself, then through the heartbeat ,send the TRUST status to resource manager for scheduling. In the scheduling step,if the node's TRUST status is 'false', it will be abandoned until it's TRUST status turn to 'true'. ***The logic of this feature is similar to node's healthcheckservice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2161) Fix build on macosx: YARN parts
[ https://issues.apache.org/jira/browse/YARN-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032175#comment-14032175 ] Hadoop QA commented on YARN-2161: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650514/YARN-2161.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3991//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3991//console This message is automatically generated. Fix build on macosx: YARN parts --- Key: YARN-2161 URL: https://issues.apache.org/jira/browse/YARN-2161 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2161.v1.patch When compiling on macosx with -Pnative, there are several warning and errors, fix this would help hadoop developers with macosx env. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1782) CLI should let users to query cluster metrics
[ https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-1782: -- Attachment: YARN-1782.patch Attached a patch. This patch introduces yarn metrics -status command which outputs like this. {noformat} $ yarn metrics -status 14/06/16 17:13:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/06/16 17:13:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Cluster Metrics : appsSubmmitted : 2 appsCompleted : 1 appsPending : 0 appsRunning : 1 appsFailed : 0 appsKilled : 0 reservedMB : 0 availableMB : 3072 allocatedMB : 5120 totalMB : 8192 containersAllocated : 4 containersReserved : 0 containersPending : 0 totalNodes : 1 activeNodes : 1 lostNodes : 0 unhealthyNodes : 0 decommissionedNodes : 0 rebootedNodes : 0 {noformat} CLI should let users to query cluster metrics - Key: YARN-1782 URL: https://issues.apache.org/jira/browse/YARN-1782 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Attachments: YARN-1782.patch Like RM webUI and RESTful services, YARN CLI should also enable users to query the cluster metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1782) CLI should let users to query cluster metrics
[ https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima reassigned YARN-1782: - Assignee: Kenji Kikushima CLI should let users to query cluster metrics - Key: YARN-1782 URL: https://issues.apache.org/jira/browse/YARN-1782 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1782.patch Like RM webUI and RESTful services, YARN CLI should also enable users to query the cluster metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032226#comment-14032226 ] Zhijie Shen commented on YARN-2032: --- [~mayank_bansal], thanks for the patch! Here're some quick comment after the first glance. 1. Why does this patch target branch-2 instead of trunk? 2. The packages of the newly added classes need to be updated after YARN-2107. 3. TTL configure should be put in YarnConfiguration. Another concern is that the data retention policy is different between Hbase and Leveldb. In Leveldb, we determine whether an entity is old enough according to TTL, and then delete it as well as its events. However, in HBase impl, it seems that deletion depends on each column family's TTL individually. In this case, it is possible that the entity is deleted, but its events (or part of them) are still there. 4. fromId and fromTs not implemented seems not be implemented yet. 5. Why do ENTITY_TABLE and INDEX_TABLE have the same schema? If I remember it correctly, we only index against the primary filters only. 6. Query parameters need to fully function, such as the secondary filters. Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2032-branch-2-1.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1564) add some basic workflow YARN services
[ https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032234#comment-14032234 ] Tsuyoshi OZAWA commented on YARN-1564: -- Resubmitted for kicking Jenkins CI. add some basic workflow YARN services - Key: YARN-1564 URL: https://issues.apache.org/jira/browse/YARN-1564 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-1564-001.patch Original Estimate: 24h Time Spent: 48h Remaining Estimate: 0h I've been using some alternative composite services to help build workflows of process execution in a YARN AM. They and their tests could be moved in YARN for the use by others -this would make it easier to build aggregate services in an AM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1782) CLI should let users to query cluster metrics
[ https://issues.apache.org/jira/browse/YARN-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032253#comment-14032253 ] Hadoop QA commented on YARN-1782: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650527/YARN-1782.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3994//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3994//console This message is automatically generated. CLI should let users to query cluster metrics - Key: YARN-1782 URL: https://issues.apache.org/jira/browse/YARN-1782 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1782.patch Like RM webUI and RESTful services, YARN CLI should also enable users to query the cluster metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2144: - Attachment: AM-page-preemption-info.png Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2144: - Attachment: YARN-2144.patch I’ve attached a patch contains changes to show preemption information on RM app page and RM log, 1) log style: {code} 2014-06-16 10:45:22,247 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Non-AM container preempted, appId=appattempt_1402886643897_0002_01, containerId=container_1402886643897_0002_01_04 {code} {code} 2014-06-16 10:45:22,247 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: AM container preempted, appId=appattempt_1402886643897_0002_01, containerId=container_1402886643897_0002_01_01 {code} 2) Info in app page: See AM-page-preemption-info.jpg Not Included, 1) Persist preemption info across RM restart/HA. 2) FairScheduler related changes to show preemption info on RM app page is not covered in this patch. Any feedback is welcome! Thanks Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032287#comment-14032287 ] Hadoop QA commented on YARN-2144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650540/AM-page-preemption-info.png against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3995//console This message is automatically generated. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2144: - Attachment: YARN-2144.patch Re-add patch to trigger jenkins building. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2163) WebUI: AppId should be treated as text when sort by AppId in Applications table
Wangda Tan created YARN-2163: Summary: WebUI: AppId should be treated as text when sort by AppId in Applications table Key: YARN-2163 URL: https://issues.apache.org/jira/browse/YARN-2163 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Priority: Minor Currently, AppId is treated as numeric, so the sort result in applications table is sorted by id (not included cluster timestamp), see attached screenshot. This is incorrect when there're multiple cluster timestamp exists. The AppId should be treated as text, we need sort AppId alphabetized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2163) WebUI: AppId should be treated as text when sort by AppId in Applications table
[ https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2163: - Description: Currently, AppId is treated as numeric, so the sort result in applications table is sorted by int typed id only (not included cluster timestamp), see attached screenshot. Order of AppId in web page should be consistent with ApplicationId.compareTo(). (was: Currently, AppId is treated as numeric, so the sort result in applications table is sorted by id (not included cluster timestamp), see attached screenshot. This is incorrect when there're multiple cluster timestamp exists. The AppId should be treated as text, we need sort AppId alphabetized.) WebUI: AppId should be treated as text when sort by AppId in Applications table --- Key: YARN-2163 URL: https://issues.apache.org/jira/browse/YARN-2163 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Priority: Minor Currently, AppId is treated as numeric, so the sort result in applications table is sorted by int typed id only (not included cluster timestamp), see attached screenshot. Order of AppId in web page should be consistent with ApplicationId.compareTo(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().
[ https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2163: - Summary: WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo(). (was: WebUI: AppId should be treated as text when sort by AppId in Applications table) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo(). Key: YARN-2163 URL: https://issues.apache.org/jira/browse/YARN-2163 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Priority: Minor Currently, AppId is treated as numeric, so the sort result in applications table is sorted by int typed id only (not included cluster timestamp), see attached screenshot. Order of AppId in web page should be consistent with ApplicationId.compareTo(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().
[ https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2163: - Attachment: YARN-2163.patch WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo(). Key: YARN-2163 URL: https://issues.apache.org/jira/browse/YARN-2163 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Priority: Minor Attachments: YARN-2163.patch Currently, AppId is treated as numeric, so the sort result in applications table is sorted by int typed id only (not included cluster timestamp), see attached screenshot. Order of AppId in web page should be consistent with ApplicationId.compareTo(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().
[ https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2163: - Attachment: apps page.png Attached screenshot of apps table and a simple fix of it. WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo(). Key: YARN-2163 URL: https://issues.apache.org/jira/browse/YARN-2163 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Priority: Minor Attachments: YARN-2163.patch, apps page.png Currently, AppId is treated as numeric, so the sort result in applications table is sorted by int typed id only (not included cluster timestamp), see attached screenshot. Order of AppId in web page should be consistent with ApplicationId.compareTo(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032333#comment-14032333 ] Hadoop QA commented on YARN-2144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650541/YARN-2144.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3996//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3996//console This message is automatically generated. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2147: -- Attachment: YARN-2147-v2.patch client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v2.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032441#comment-14032441 ] Chen He commented on YARN-2147: --- Thank you for the comment, [~ozawa]. Patch updated. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147-v2.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2147: -- Attachment: (was: YARN-2147-v2.patch) client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2144: - Attachment: YARN-2144.patch Attached new patch fixed test failures. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch fix patch . I generated it from the wrong directory. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2164) Add switch 'restart' for yarn-daemon.sh
Jun Gong created YARN-2164: -- Summary: Add switch 'restart' for yarn-daemon.sh Key: YARN-2164 URL: https://issues.apache.org/jira/browse/YARN-2164 Project: Hadoop YARN Issue Type: Improvement Reporter: Jun Gong Priority: Minor For convenience, add an switch 'restart' for yarn-daemon.sh. e.g. We could use yarn-daemon.sh restart nodemanager instead of yarn-daemon.sh stop nodemanager; yarn-daemon.sh start nodemanager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032603#comment-14032603 ] Tassapol Athiapinya commented on YARN-2144: --- [~leftnoteasy] Can you please clarify me on these points? - In AM page, Does Resource Preempted from Current Attempt mean Total Resource Preempted from Latest AM attempt? Can it show only data point from current (is it latest?) attempt? - Can you change #Container Preempted from Current Attempt: to Number of Containers Preempted from Current(Latest) Attempt? # syntax maybe hard to comprehend for wider group of user. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command
[ https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-853: Fix Version/s: 0.23.11 Thanks, Deveraj! I committed this to branch-0.23 as well. maximum-am-resource-percent doesn't work after refreshQueues command Key: YARN-853 URL: https://issues.apache.org/jira/browse/YARN-853 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.1.0-beta, 0.23.11 Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, YARN-853-4.patch, YARN-853.patch If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.queue-path.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2164) Add switch 'restart' for yarn-daemon.sh
[ https://issues.apache.org/jira/browse/YARN-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-2164: --- Attachment: YARN-2164.patch Add switch 'restart' for yarn-daemon.sh Key: YARN-2164 URL: https://issues.apache.org/jira/browse/YARN-2164 Project: Hadoop YARN Issue Type: Improvement Reporter: Jun Gong Priority: Minor Attachments: YARN-2164.patch For convenience, add an switch 'restart' for yarn-daemon.sh. e.g. We could use yarn-daemon.sh restart nodemanager instead of yarn-daemon.sh stop nodemanager; yarn-daemon.sh start nodemanager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032611#comment-14032611 ] Hadoop QA commented on YARN-2144: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650581/YARN-2144.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3998//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3998//console This message is automatically generated. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032646#comment-14032646 ] Hadoop QA commented on YARN-1769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650590/YARN-1769.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3999//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3999//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3999//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
Karam Singh created YARN-2165: - Summary: Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032821#comment-14032821 ] Hadoop QA commented on YARN-1769: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650615/YARN-1769.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4001//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4001//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store
Karam Singh created YARN-2166: - Summary: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store Key: YARN-2166 URL: https://issues.apache.org/jira/browse/YARN-2166 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UnCaughtExcpetion -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store
[ https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karam Singh updated YARN-2166: -- Description: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UncaughtException -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} was: Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UnCaughtExcpetion -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store - Key: YARN-2166 URL: https://issues.apache.org/jira/browse/YARN-2166 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UncaughtException -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM
[ https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032909#comment-14032909 ] Karthik Kambatla commented on YARN-1898: I agree with Robert here. We wouldn't be able to track metrics and jmx of the Standby RM if we redirect them. [~xgong], [~acmurthy] - what do you think? Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM - Key: YARN-1898 URL: https://issues.apache.org/jira/browse/YARN-1898 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Yesha Vora Assignee: Xuan Gong Fix For: 2.4.1 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, YARN-1898.addendum.patch, YARN-1898.addendum.patch Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to Active RM. It should not be redirected to Active RM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2159) allocateContainer() in SchedulerNode needs a clearer LOG.info message
[ https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032913#comment-14032913 ] Karthik Kambatla commented on YARN-2159: +1. Committing this. allocateContainer() in SchedulerNode needs a clearer LOG.info message - Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie, supportability Attachments: YARN2159-01.patch This bit of code: {quote} LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); {quote} results in a line like: {quote} 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available {quote} That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Here is one suggested phrasing - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032918#comment-14032918 ] Mayank Bansal commented on YARN-2022: - HI [~sunilg] Thanks for the patch. Overall looks ok however I think we need to add the test case for AM percentage per queue as well. Thanks, Mayank Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2159) Better logging in SchedulerNode#allocateContainer
[ https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2159: --- Summary: Better logging in SchedulerNode#allocateContainer (was: allocateContainer() in SchedulerNode needs a clearer LOG.info message) Better logging in SchedulerNode#allocateContainer - Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie, supportability Attachments: YARN2159-01.patch This bit of code: {quote} LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); {quote} results in a line like: {quote} 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available {quote} That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Here is one suggested phrasing - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer
[ https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032930#comment-14032930 ] Hudson commented on YARN-2159: -- FAILURE: Integrated in Hadoop-trunk-Commit #5712 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5712/]) YARN-2159. Better logging in SchedulerNode#allocateContainer. (Ray Chiang via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603003) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java Better logging in SchedulerNode#allocateContainer - Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie, supportability Attachments: YARN2159-01.patch This bit of code: {quote} LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); {quote} results in a line like: {quote} 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available {quote} That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Here is one suggested phrasing - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer
[ https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032986#comment-14032986 ] Tsuyoshi OZAWA commented on YARN-2159: -- Thanks Ray for the contribution, and thanks Karthik for the review. Better logging in SchedulerNode#allocateContainer - Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie, supportability Attachments: YARN2159-01.patch This bit of code: {quote} LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); {quote} results in a line like: {quote} 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available {quote} That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Here is one suggested phrasing - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033069#comment-14033069 ] Jian He commented on YARN-1885: --- lgtm, +1. other than a minor code comment, fixed myself, waiting for jenkins to commit.. RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1885: -- Attachment: YARN-1885.patch RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033101#comment-14033101 ] Jian He commented on YARN-2052: --- Application itself may possibly use Container.getId to differentiate the containers, two containers allocated by two RMs may have the same id integer, then the application logic will break. will this be fine? If we are taking this approach of adding a new field to differentiate the containerId, we should at least document that ContainerId.getid is not the way to differentiate containers. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2052.1.patch, YARN-2052.2.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033106#comment-14033106 ] Junping Du commented on YARN-1339: -- Thanks for addressing my comments, [~jlowe]! +1. The v6 patch LGTM, will commit it shortly. Recover DeletionService state upon nodemanager restart -- Key: YARN-1339 URL: https://issues.apache.org/jira/browse/YARN-1339 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1339.patch, YARN-1339v2.patch, YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, YARN-1339v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2159) Better logging in SchedulerNode#allocateContainer
[ https://issues.apache.org/jira/browse/YARN-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033109#comment-14033109 ] Ray Chiang commented on YARN-2159: -- Great. Thanks! Better logging in SchedulerNode#allocateContainer - Key: YARN-2159 URL: https://issues.apache.org/jira/browse/YARN-2159 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie, supportability Fix For: 2.5.0 Attachments: YARN2159-01.patch This bit of code: {quote} LOG.info(Assigned container + container.getId() + of capacity + container.getResource() + on host + rmNode.getNodeAddress() + , which currently has + numContainers + containers, + getUsedResource() + used and + getAvailableResource() + available); {quote} results in a line like: {quote} 2014-05-30 16:17:43,573 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Assigned container container_14000_0009_01_00 of capacity memory:1536, vCores:1 on host machine.host.domain.com:8041, which currently has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available {quote} That message is fine in most cases, but looks pretty bad after the last available allocation, since it says something like vCores:0 available. Here is one suggested phrasing - which has 18 containers, memory:27648, vCores:18 used and memory:3072, vCores:0 available after allocation -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1885: -- Attachment: YARN-1885.patch RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
Junping Du created YARN-2167: Summary: LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033117#comment-14033117 ] Hadoop QA commented on YARN-1885: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650661/YARN-1885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4002//console This message is automatically generated. RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2167: - Attachment: YARN-2167.patch Upload a quick patch to fix it. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033127#comment-14033127 ] Jason Lowe commented on YARN-2167: -- +1 pending Jenkins. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033157#comment-14033157 ] Hadoop QA commented on YARN-2167: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650677/YARN-2167.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4004//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4004//console This message is automatically generated. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033165#comment-14033165 ] Jian He commented on YARN-2157: --- Thanks for the patch! Some suggestions on the patch: ClusterMetrics shows the metrics of the YARN cluster? such as… {code} ClusterMetrics shows the statistics of NodeManagers from the + ResourceManager's perspective {code} Do you mean the queue name? if so, we can use queue name. {code} queue identifier {code} Can you clarify more about what this format means? {code} running_num {code} Can you please clarify the definition of pending applications? i.e. an application that has not yet been assigned any containers. Total number of applications killed - Total number of killed applications, similarly for “Total number of applications failed” can you clarify the meaning of PendingMB, PendingVCores, PendingContainers also? i.e. the pending resource requests that are not yet fulfilled by the scheduler. allocatedContainers can be put before allocatedGB for consistency. {code} +*-+--+ +|allocatedGB | Current allocated memory in GB +*-+--+ +|allocatedContainers | Current number of allocated containers +*-+--+ {code} Document YARN metrics - Key: YARN-2157 URL: https://issues.apache.org/jira/browse/YARN-2157 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: YARN-2157.patch YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033178#comment-14033178 ] Hadoop QA commented on YARN-1339: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649673/YARN-1339v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4005//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4005//console This message is automatically generated. Recover DeletionService state upon nodemanager restart -- Key: YARN-1339 URL: https://issues.apache.org/jira/browse/YARN-1339 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1339.patch, YARN-1339v2.patch, YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, YARN-1339v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033185#comment-14033185 ] Hadoop QA commented on YARN-1885: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650674/YARN-1885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4003//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4003//console This message is automatically generated. RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1898) Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM
[ https://issues.apache.org/jira/browse/YARN-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1898. --- Resolution: Fixed Unfortunately jmx is a mess right now - it includes both machine metrics together with what should usually belong to /metrics. So you would get some stale metrics related to YARN if we don't redirect it to the active. Not sure what the right fix is without explicitly listing down and reasoning about all the stuff that is exposed in /jmx. IAC, let's open a new ticket and link to this one. Tx. Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM - Key: YARN-1898 URL: https://issues.apache.org/jira/browse/YARN-1898 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Yesha Vora Assignee: Xuan Gong Fix For: 2.4.1 Attachments: YARN-1898.1.patch, YARN-1898.2.patch, YARN-1898.3.patch, YARN-1898.addendum.patch, YARN-1898.addendum.patch Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to Active RM. It should not be redirected to Active RM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033190#comment-14033190 ] Jian He commented on YARN-2144: --- Haven't looked at the patch. YARN-1809 is adding the attempt UI, Maybe the app UI should show the total preempted containers info, and attempt UI should show each attempt's preempted containers info ? Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033233#comment-14033233 ] Hudson commented on YARN-1885: -- FAILURE: Integrated in Hadoop-trunk-Commit #5714 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5714/]) YARN-1885. Fixed a bug that RM may not send application-clean-up signal to NMs where the completed applications previously ran in case of RM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603028) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppRunningOnNodeEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptContainerAcquiredEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStartedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java *
[jira] [Created] (YARN-2168) SCM/Client/NM/Admin protocols
Chris Trezzo created YARN-2168: -- Summary: SCM/Client/NM/Admin protocols Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Define and implement the following protocols and protocol messages using protobufs: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2168: --- Attachment: YARN-2168-trunk-v1.patch Attached is v1 patch based off of trunk. SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2168-trunk-v1.patch Define and implement the following protocols and protocol messages using protobufs: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033286#comment-14033286 ] Junping Du commented on YARN-2167: -- The patch is very tiny and straight-forward, so no need for additional unit test. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033293#comment-14033293 ] Hadoop QA commented on YARN-2168: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650693/YARN-2168-trunk-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4006//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4006//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4006//console This message is automatically generated. SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2168-trunk-v1.patch Define and implement the following protocols and protocol messages using protobufs: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1339) Recover DeletionService state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033295#comment-14033295 ] Hudson commented on YARN-1339: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5715 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5715/]) YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603036) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java Recover DeletionService state upon nodemanager restart -- Key: YARN-1339 URL: https://issues.apache.org/jira/browse/YARN-1339 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.5.0 Attachments: YARN-1339.patch, YARN-1339v2.patch, YARN-1339v3-and-YARN-1987.patch, YARN-1339v4.patch, YARN-1339v5.patch, YARN-1339v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033299#comment-14033299 ] Wangda Tan commented on YARN-1885: -- Thanks [~vinodkv] and [~jianhe] for review and commit! RM may not send the app-finished signal after RM restart to some nodes where the application ran before RM restarts --- Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033306#comment-14033306 ] Wangda Tan commented on YARN-2144: -- Hi [~tassapola], bq. In AM page, Does Resource Preempted from Current Attempt mean Total Resource Preempted from Latest AM attempt? Can it show only data point from current (is it latest?) attempt? Yes, Yes, it can only show data point from current(latest) attempt. bq. Can you change #Container Preempted from Current Attempt: to Number of Containers Preempted from Current(Latest) Attempt? # syntax maybe hard to comprehend for wider group of user. Thanks for this comment, I agree with you. I'll address this comment later. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2157: Attachment: YARN-2157.2.patch Thanks [~jianhe] for the suggestions! Updated the patch. Document YARN metrics - Key: YARN-2157 URL: https://issues.apache.org/jira/browse/YARN-2157 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: YARN-2157.2.patch, YARN-2157.patch YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033326#comment-14033326 ] Wangda Tan commented on YARN-2144: -- bq. Haven't looked at the patch. YARN-1809 is adding the attempt UI, Maybe the app UI should show the total preempted containers info, and attempt UI should show each attempt's preempted containers info? Thanks for pointing the attempt UI JIRA. I think RM will cleanup application's resource usage at the beginning of each attempt start, so it should make sense to show latest attempt's preempted containers info on app UI. And after we can persist preemption info across RM restart and YARN-1807 committed, we can show each attempt's preempted containers info on attempts UI. Do you agree? Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. 1. For debugging purpose, RM should log this. 2. For administrative purpose, RM webpage should have a page to show recent preemption events. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033327#comment-14033327 ] Wangda Tan commented on YARN-2074: -- Hi Jian, I've reviewed your patch, one question, Is following a bug? {code} int exitStatus = ContainerExitStatus.PREEMPTED; switch (event.getType()) { case LAUNCH_FAILED: RMAppAttemptLaunchFailedEvent launchFaileEvent = (RMAppAttemptLaunchFailedEvent) event; diags = launchFaileEvent.getMessage(); break; {code} amContainerExitStatus will be set to ContainerExitStatus.PREEMPTED in any case. If it's a bug, I think we should cover a AM completed/fail case and it shouldn't be treated as preempted. Thanks, Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033328#comment-14033328 ] Jason Lowe commented on YARN-2167: -- +1 lgtm. Committing this. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1405#comment-1405 ] Tsuyoshi OZAWA commented on YARN-2074: -- {quote} amContainerExitStatus will be set to ContainerExitStatus.PREEMPTED in any case. If it's a bug, I think we should cover a AM completed/fail case and it shouldn't be treated as preempted. Thanks, {quote} [~wangda], thank you for pointing it out. I'll check it. Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033342#comment-14033342 ] Junping Du commented on YARN-2167: -- Thanks [~jlowe] for review and commit! LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Fix For: 3.0.0, 2.5.0 Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2167) LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block
[ https://issues.apache.org/jira/browse/YARN-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033346#comment-14033346 ] Hudson commented on YARN-2167: -- FAILURE: Integrated in Hadoop-trunk-Commit #5716 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5716/]) YARN-2167. LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block. Contributed by Junping Du (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603039) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java LeveldbIterator should get closed in NMLeveldbStateStoreService#loadLocalizationState() within finally block Key: YARN-2167 URL: https://issues.apache.org/jira/browse/YARN-2167 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Fix For: 3.0.0, 2.5.0 Attachments: YARN-2167.patch In NMLeveldbStateStoreService#loadLocalizationState(), we have LeveldbIterator to read NM's localization state but it is not get closed in finally block. We should close this connection to DB as a common practice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2074: -- Attachment: YARN-2074.6.patch thanks for pointing out! fixed it. One thing to note here is that AM ContainerExitStatus for succeeded app is not saved in state store. only containerExitStatus for failed apps is saved. Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033356#comment-14033356 ] Hadoop QA commented on YARN-2074: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650717/YARN-2074.6.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4008//console This message is automatically generated. Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033358#comment-14033358 ] Hadoop QA commented on YARN-2157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650712/YARN-2157.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4007//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4007//console This message is automatically generated. Document YARN metrics - Key: YARN-2157 URL: https://issues.apache.org/jira/browse/YARN-2157 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: YARN-2157.2.patch, YARN-2157.patch YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033377#comment-14033377 ] Zhijie Shen commented on YARN-2165: --- [~karams], how about gathering the similar validation issues (YARN-2166) here? Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero - Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033396#comment-14033396 ] Hadoop QA commented on YARN-2074: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650719/YARN-2074.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4009//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4009//console This message is automatically generated. Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2162) Fair Scheduler :ability to configure minResources and maxResources in terms of percentage
[ https://issues.apache.org/jira/browse/YARN-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033399#comment-14033399 ] Maysam Yabandeh commented on YARN-2162: --- If we add this feature, it should certainly be optional. Think of the following scenario, which quite usual: # Cluster is divided between 100 queues # Queue 1 require more resources and ask for more capacity # More machines added to the cluster to respond to the new demand of Queue 1 # The min and max Resources of Queue 1 is updated accordingly If the queues' min and max resources are expressed in terms of percentage, then all the queues have to update their percentage. Fair Scheduler :ability to configure minResources and maxResources in terms of percentage - Key: YARN-2162 URL: https://issues.apache.org/jira/browse/YARN-2162 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Ashwin Shankar Labels: scheduler minResources and maxResources in fair scheduler configs are expressed in terms of absolute numbers X mb, Y vcores. As a result, when we expand or shrink our hadoop cluster, we need to recalculate and change minResources/maxResources accordingly, which is pretty inconvenient. We can circumvent this problem if we can (optionally) configure these properties in terms of percentage of cluster capacity. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2165: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero - Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store
[ https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2166: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store - Key: YARN-2166 URL: https://issues.apache.org/jira/browse/YARN-2166 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Karam Singh Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store other if we start timelineserver with yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on throwing UncaughtException -ive value {code} 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[Thread-4,5,main] threw an Exception. java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033402#comment-14033402 ] Wangda Tan commented on YARN-2074: -- [~jianhe], changes almost LGTM, one comment in test, Could you add a test case of a app has several attempts, some of them is failed, some is preempted, we need check if the RMAppAttemptImpl.isLastAttempt properly set. Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033403#comment-14033403 ] Wangda Tan commented on YARN-2152: -- Thanks [~jianhe] for update, LGTM, +1. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033421#comment-14033421 ] Jian He commented on YARN-2074: --- testPreemptedAMRestartOnRMRestart is doing with multiple attempts? Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033423#comment-14033423 ] Jian He commented on YARN-2074: --- sorry, I meant testAMPreemptedNotCountedForAMFailures Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.6.patch Thank you [~mayank_bansal]. I have updated the patch with new test case named testAMResourcePercentForSkippedAMContainers for AMResourcePercent test. Kindly review. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)