[jira] [Updated] (YARN-2192) TestRMHA fails when run with a mix of Schedulers
[ https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2192: Description: If the test is run with FairSchedulers, some of the tests fail because the metricsssytem objects are shared across tests and not destroyed completely. {code} Error Message Metrics source QueueMetrics,q0=root already exists! Stacktrace org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427) {code} was: Some TestRMHA assume CapacityScheduler. If the test is run with multiple schedulers, some of the tests fail because the metricsssytem objects that are shared across tests and fail as below. {code} Error Message Metrics source QueueMetrics,q0=root already exists! Stacktrace org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427) {code} TestRMHA fails when run with a mix of Schedulers Key: YARN-2192 URL: https://issues.apache.org/jira/browse/YARN-2192 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot If the test is run with FairSchedulers, some of the tests fail because the metricsssytem objects are shared across tests and not destroyed completely. {code} Error Message Metrics source QueueMetrics,q0=root already exists! Stacktrace org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2192) TestRMHA fails when run with a mix of Schedulers
[ https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2192: Attachment: YARN-2192.patch Fix the cleanup of the metrics by removing the conditional that would not work in FairScheduler. TestRMHA fails when run with a mix of Schedulers Key: YARN-2192 URL: https://issues.apache.org/jira/browse/YARN-2192 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2192.patch If the test is run with FairSchedulers, some of the tests fail because the metricsssytem objects are shared across tests and not destroyed completely. {code} Error Message Metrics source QueueMetrics,q0=root already exists! Stacktrace org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2192) TestRMHA fails when run with a mix of Schedulers
[ https://issues.apache.org/jira/browse/YARN-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040476#comment-14040476 ] Hadoop QA commented on YARN-2192: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651933/YARN-2192.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4045//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4045//console This message is automatically generated. TestRMHA fails when run with a mix of Schedulers Key: YARN-2192 URL: https://issues.apache.org/jira/browse/YARN-2192 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2192.patch If the test is run with FairSchedulers, some of the tests fail because the metricsssytem objects are shared across tests and not destroyed completely. {code} Error Message Metrics source QueueMetrics,q0=root already exists! Stacktrace org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1281) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:427) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2191: - Attachment: YARN-2191.patch Uploaded a simplified patch and re-kick jenkins. Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040551#comment-14040551 ] Remus Rusanu commented on YARN-1972: [~vinodkv] I see there is no container executor topic at src/site/apt. I'm thinking to write the WCE as part of a 'secure container' topic, which would describe LCE as well. Is this OK? Implement secure Windows Container Executor --- Key: YARN-1972 URL: https://issues.apache.org/jira/browse/YARN-1972 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1972.1.patch, YARN-1972.2.patch h1. Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: * it does no delegate the creation of the user cache directories to the native implementation. * it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. h2. Deployment Requirements To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. h2. Dedicated high privilege Service Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040567#comment-14040567 ] Wangda Tan commented on YARN-2181: -- Offine discussed with [~tassapola], requirements of this JIRA: *App page:* 1) Total number of task containers preempted in this app 2) Total number of am containers preempted in this app 3) Total resource preempted in this app 4) Total number of task containers preempted in latest attempt 5) Total number of am containers preempted in latest attempt 6) Total resource preempted in latest attempt *Queue page:* 1) Total number of task containers preempted in this queue 2) Total number of am containers preempted in this queue 3) Total resource preempted in this queue Please let me know if you have any comment. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app/queue, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040589#comment-14040589 ] Hadoop QA commented on YARN-2191: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651949/YARN-2191.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4046//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4046//console This message is automatically generated. Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2193) Job history UI value are wrongly rendered
Ashutosh Jindal created YARN-2193: - Summary: Job history UI value are wrongly rendered Key: YARN-2193 URL: https://issues.apache.org/jira/browse/YARN-2193 Project: Hadoop YARN Issue Type: Bug Reporter: Ashutosh Jindal Job history UI value are wrongly rendered because some fields are missing in jhist file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2193) Job history UI value are wrongly rendered
[ https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Jindal updated YARN-2193: -- Attachment: issue.jpg Job history UI value are wrongly rendered - Key: YARN-2193 URL: https://issues.apache.org/jira/browse/YARN-2193 Project: Hadoop YARN Issue Type: Bug Reporter: Ashutosh Jindal Attachments: issue.jpg Job history UI value are wrongly rendered because some fields are missing in jhist file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI
[ https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040726#comment-14040726 ] Zhijie Shen commented on YARN-2181: --- Wangda, is it good to do something similar to the job page of JHS? Say we show the total number of task containers preempted in this app; this number is associated with a link, which redirect users to the list of all the preempted containers. Similar for other numbers. Add preemption info to RM Web UI Key: YARN-2181 URL: https://issues.apache.org/jira/browse/YARN-2181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan We need add preemption info to RM web page to make administrator/user get more understanding about preemption happened on app/queue, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2193) Job history UI value are wrongly rendered
[ https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040735#comment-14040735 ] Zhijie Shen commented on YARN-2193: --- It seems that the data in jobsDataTable has been corrupted. bq. because some fields are missing in jhist file [~ashutosh_jindal], would you please share what was missing in jhist file? Job history UI value are wrongly rendered - Key: YARN-2193 URL: https://issues.apache.org/jira/browse/YARN-2193 Project: Hadoop YARN Issue Type: Bug Reporter: Ashutosh Jindal Attachments: issue.jpg Job history UI value are wrongly rendered because some fields are missing in jhist file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040745#comment-14040745 ] Chen He commented on YARN-2109: --- Done TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler -- Key: YARN-2109 URL: https://issues.apache.org/jira/browse/YARN-2109 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Labels: test testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set it to be CapacityScheduler. But if the default scheduler is set to FairScheduler then the rest of the tests that execute after this will fail with invalid cast exceptions when getting queuemetrics. This is based on test execution order as only the tests that execute after this test will fail. This is because the queuemetrics will be initialized by this test to QueueMetrics and shared by the subsequent tests. We can explicitly clear the metrics at the end of this test to fix this. For example java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81) at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040841#comment-14040841 ] Sunil G commented on YARN-2022: --- Thank you [~leftnoteasy] for the comments. I will update the patch for handling changes of YARN-1368. bq.With this condition, container preemption will be interrupted when we have am-capacity reached maxAMCapacity or less, is it what the original design? As per the discussion with Mayank and Carlo, it was decided to upload a simple patch by respecting the AM Resource percent only. I had offline discussion earlier with [~curino] regarding the Max Capacity and AM Resource percent. AM Resource percent considers max capacity of a Queue. There is scope of improving this solution in that aspect, that I feel we can do in another JIRA. I will raise a separate JIRA for same. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2072) RM/NM UIs and webservices are missing vcore information
[ https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-2072: - Attachment: YARN-2072.patch Thanks for the review Tom! I fixed the getReservedVirtualCores() bug and the typo. I will file a followup jira for displaying the vcores the user would use (as opposed to today's default of 1) in the capacity and fifo schedulers. RM/NM UIs and webservices are missing vcore information --- Key: YARN-2072 URL: https://issues.apache.org/jira/browse/YARN-2072 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, webapp Affects Versions: 3.0.0, 2.4.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-2072.patch, YARN-2072.patch Change RM and NM UIs and webservices to include virtual cores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2130: - Attachment: YARN-2130.5.patch Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext Key: YARN-2130 URL: https://issues.apache.org/jira/browse/YARN-2130 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, YARN-2130.4.patch, YARN-2130.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2072) RM/NM UIs and webservices are missing vcore information
[ https://issues.apache.org/jira/browse/YARN-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041012#comment-14041012 ] Hadoop QA commented on YARN-2072: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651991/YARN-2072.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4047//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4047//console This message is automatically generated. RM/NM UIs and webservices are missing vcore information --- Key: YARN-2072 URL: https://issues.apache.org/jira/browse/YARN-2072 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, webapp Affects Versions: 3.0.0, 2.4.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-2072.patch, YARN-2072.patch Change RM and NM UIs and webservices to include virtual cores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041013#comment-14041013 ] Tsuyoshi OZAWA commented on YARN-2130: -- [~kkambatl], thank you for the review. Updated a patch to address the comments: 1. Made RMAppManager's and ResourceTrackerService's constructor minimal. 2. Changed to leave the fields in ClientRMService. 3. Fixed to pass tests including initialization order of mocks and pointing correct objects from mocks. TestClientRMService#mockResourceScheduler is one of them. Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext Key: YARN-2130 URL: https://issues.apache.org/jira/browse/YARN-2130 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, YARN-2130.4.patch, YARN-2130.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2109: Attachment: YARN-2109.001.patch TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler -- Key: YARN-2109 URL: https://issues.apache.org/jira/browse/YARN-2109 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Labels: test Attachments: YARN-2109.001.patch testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set it to be CapacityScheduler. But if the default scheduler is set to FairScheduler then the rest of the tests that execute after this will fail with invalid cast exceptions when getting queuemetrics. This is based on test execution order as only the tests that execute after this test will fail. This is because the queuemetrics will be initialized by this test to QueueMetrics and shared by the subsequent tests. We can explicitly clear the metrics at the end of this test to fix this. For example java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81) at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2109: Attachment: YARN-2109.001.patch Submitting the patch TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler -- Key: YARN-2109 URL: https://issues.apache.org/jira/browse/YARN-2109 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Labels: test Attachments: YARN-2109.001.patch, YARN-2109.001.patch testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set it to be CapacityScheduler. But if the default scheduler is set to FairScheduler then the rest of the tests that execute after this will fail with invalid cast exceptions when getting queuemetrics. This is based on test execution order as only the tests that execute after this test will fail. This is because the queuemetrics will be initialized by this test to QueueMetrics and shared by the subsequent tests. We can explicitly clear the metrics at the end of this test to fix this. For example java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81) at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2144: -- Attachment: YARN-2144.patch Looks good overall, did some minor edits myself. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041063#comment-14041063 ] Jian He commented on YARN-2191: --- looks good, +1 Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041077#comment-14041077 ] Junping Du commented on YARN-1341: -- bq. Yes, applications should be like containers. If we fail to store an application start in the state store then we should fail the container launch that triggered the application to be added. This already happens in the current patch for YARN-1354. If we fail to store the completion of an application then worst-case we will report an application to the RM on restart that isn't active, and the RM will correct the NM when it re-registers. That make sense. I guess we should do additional work to check if the behavior is as our expected. bq. I wasn't planning on persisting metrics during restart, as there are quite a few (e.g.: RPC metrics, etc.), and I'm not sure it's critical that they be preserved across a restart. Does RM restart do this or are there plans to do so? I think these metrics are important especially for user's monitoring tools and we should make these info consistent during restart. So far from I know, RM restart didn't track this because these metrics will be recover during events recovery in RM restart. In current NM restart, some metrics could be lost, i.e. allocatedContainers, etc. I think we should either count them back as part of events during recovery or persistent them. Thoughts? bq. Therefore I don't believe the effort to maintain a stale tag is going to be worth it. Also if we refuse to load a state store that's stale then we are going to leak containers because we won't try to recover anything from a stale state store. If so, how about we don't apply these changes until these changes can be persistent? If so, we still keep consistent between state store and NM's current state. Even we choose to fail the NM, we still can load state and recover the working. bq. Instead I think we should decide in the various store failure cases whether the error should be fatal to the operation (which may lead to it being fatal to the NM overall) or if we feel the recovery with stale information is a better outcome than taking the NM down. In the latter case we should just log the error and move on. Do we expect some operations can be failed while other operation can be successful? If this means short-term unavailable for persistent effort, we can just handle it by adding retry. If not, we should expect other operations that fetal get failed soon enough, and in this case, log error and move on in non-fatal operations don't have many differences. No? Recover NMTokens upon nodemanager restart - Key: YARN-1341 URL: https://issues.apache.org/jira/browse/YARN-1341 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2022: -- Attachment: YARN-2022.7.patch I have updated the patch w.r.t YARN-1368. Also added test case to verify whether RMContainer is marked as AM Container even after RM restart/failover. Thank you [~leftnoteasy] for pointing this. Please review. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041086#comment-14041086 ] Craig Welch commented on YARN-1039: --- [~ste...@apache.org] wrt the need for a container level flag / a way for the application master to launch long lived containers - definitely, but the idea was for that to come as a later step - although that may be short-sighted, as it may be better to come up with a common way to do this for the application master container and the containers it later launches now instead of ending up with unmatched approaches later... This first step is to provide a way for the application master to be launched in a long lived container (generally, an application master for a long lived application will need to itself be launched in a long lived container - at least, it needs to be possible to do so) - which is why there needs to be some way to indicate the need for a long lived container during application submission (necessary but not sufficient overall...) [~zjshen] I was also wondering about using the tags, but after talking with [~xgong] we are not thinking that is the way to go because tags don't seem to be about changing behavior but only about freeform way to enable search/display/etc. After this discussion and some looking around it really seems that what we are after is a way to communicate a quality of the needed container to the resource manager both at application submission (for the application master container) and also for later container launches by the master, kind of like the ResourceProto, which is also already present in both cases for the same reason (I suggested adding it there, actually, as something necessary for the container but [~xgong] objected, thinking it is really specific to metric qualities (cpu, memory...). I'm going to take a look at adding something alongside /similar to the ResourceProto to indicate constraints/requirements for the container, starting with long lived, that can be common to application submission and when the containers are started later by the application, not necessarily a long field for bit manipulation but something which is also extensible Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2191: -- Attachment: YARN-2191.patch Changed the test name to be more accurate Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041096#comment-14041096 ] Hadoop QA commented on YARN-2109: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652006/YARN-2109.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4049//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4049//console This message is automatically generated. TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler -- Key: YARN-2109 URL: https://issues.apache.org/jira/browse/YARN-2109 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Labels: test Attachments: YARN-2109.001.patch, YARN-2109.001.patch testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set it to be CapacityScheduler. But if the default scheduler is set to FairScheduler then the rest of the tests that execute after this will fail with invalid cast exceptions when getting queuemetrics. This is based on test execution order as only the tests that execute after this test will fail. This is because the queuemetrics will be initialized by this test to QueueMetrics and shared by the subsequent tests. We can explicitly clear the metrics at the end of this test to fix this. For example java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81) at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041095#comment-14041095 ] Hadoop QA commented on YARN-2130: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652002/YARN-2130.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 17 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4048//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4048//console This message is automatically generated. Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext Key: YARN-2130 URL: https://issues.apache.org/jira/browse/YARN-2130 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, YARN-2130.4.patch, YARN-2130.5.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041105#comment-14041105 ] Steve Loughran commented on YARN-1039: -- I see. I'd assume that the service flag would imply long-lived, but maybe they could be separated. I'd like to see a {{long}} enum of flags here as its easier to be forwards compatible Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041110#comment-14041110 ] Craig Welch commented on YARN-1039: --- The more I look around, the better I like the idea of adding it to the resource proto. It is the same kind of thing as the items already in there - it's a characteristic required for the container (it isn't a metric style quality, but still, it's a characteristic of the resource needed) and it is already present everywhere the information is needed (at application submission and when containers are requested). Adding something so similar alongside the resource proto seems unnecessary. Do you agree with [~xgong]'s concerns or do you think it makes sense to add it there? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041128#comment-14041128 ] Lohit Vijayarenu commented on YARN-796: --- As [~tucu00] mentioned, label sounds closely related to affinity and should be treated less off a resource. It becomes closely related to resources when it comes to exposing them on scheduler queues and exposing that to users who wish to schedule their jobs on certain set of labeled nodes. This is definitely very useful feature to have. Looking forward for design document. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2109) TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041161#comment-14041161 ] Hadoop QA commented on YARN-2109: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652007/YARN-2109.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4050//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4050//console This message is automatically generated. TestRM fails some tests when some tests run with CapacityScheduler and some with FairScheduler -- Key: YARN-2109 URL: https://issues.apache.org/jira/browse/YARN-2109 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Labels: test Attachments: YARN-2109.001.patch, YARN-2109.001.patch testNMTokenSentForNormalContainer requires CapacityScheduler and was fixed in [YARN-1846|https://issues.apache.org/jira/browse/YARN-1846] to explicitly set it to be CapacityScheduler. But if the default scheduler is set to FairScheduler then the rest of the tests that execute after this will fail with invalid cast exceptions when getting queuemetrics. This is based on test execution order as only the tests that execute after this test will fail. This is because the queuemetrics will be initialized by this test to QueueMetrics and shared by the subsequent tests. We can explicitly clear the metrics at the end of this test to fix this. For example java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:103) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1275) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:808) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:230) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:90) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:85) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:81) at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMToken(TestRM.java:232) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1365: Attachment: YARN-1365.008.patch Addressed all comments ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041172#comment-14041172 ] Hadoop QA commented on YARN-2144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652015/YARN-2144.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4051//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4051//console This message is automatically generated. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2194) Add Cgroup support for RedHat 7
Wei Yan created YARN-2194: - Summary: Add Cgroup support for RedHat 7 Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041228#comment-14041228 ] Hadoop QA commented on YARN-2022: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652024/YARN-2022.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4052//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4052//console This message is automatically generated. Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041239#comment-14041239 ] Hadoop QA commented on YARN-2191: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652025/YARN-2191.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4053//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4053//console This message is automatically generated. Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041255#comment-14041255 ] Jon Bringhurst commented on YARN-2194: -- It might also be useful to have a SystemdNspawnContainerExectuor for yarn.nodemanager.container-executor.class. I don't know how many people would be interesting it using it however. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041263#comment-14041263 ] Wei Yan commented on YARN-2194: --- SystemdNspawnContainerExectuor is a good idea. Just add one for systemd besides the standard CgroupsLCEHandler. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041267#comment-14041267 ] Hadoop QA commented on YARN-1365: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652034/YARN-1365.008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4054//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4054//console This message is automatically generated. ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2195) Clean a piece of code in ResourceRequest
Wei Yan created YARN-2195: - Summary: Clean a piece of code in ResourceRequest Key: YARN-2195 URL: https://issues.apache.org/jira/browse/YARN-2195 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor {code} if (numContainersComparison == 0) { return 0; } else { return numContainersComparison; } {code} This code should be cleaned as {code} return numContainersComparison; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2195) Clean a piece of code in ResourceRequest
[ https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2195: -- Attachment: YARN-2195.patch Clean a piece of code in ResourceRequest Key: YARN-2195 URL: https://issues.apache.org/jira/browse/YARN-2195 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2195.patch {code} if (numContainersComparison == 0) { return 0; } else { return numContainersComparison; } {code} This code should be cleaned as {code} return numContainersComparison; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2195) Clean a piece of code in ResourceRequest
[ https://issues.apache.org/jira/browse/YARN-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041331#comment-14041331 ] Hadoop QA commented on YARN-2195: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652055/YARN-2195.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4055//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4055//console This message is automatically generated. Clean a piece of code in ResourceRequest Key: YARN-2195 URL: https://issues.apache.org/jira/browse/YARN-2195 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2195.patch {code} if (numContainersComparison == 0) { return 0; } else { return numContainersComparison; } {code} This code should be cleaned as {code} return numContainersComparison; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041355#comment-14041355 ] Anubhav Dhoot commented on YARN-1365: - The changes for addApplication caused the failures. I am going to open a separate jira to fix that as per Jian's suggest and undo those changes here. ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1365: Attachment: YARN-1365.008.patch Without the addApplication changes. Those will be covered in YARN-2196 ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2196) Add Application duplicate APP_ACCEPTED events can be prevented with a flag
Anubhav Dhoot created YARN-2196: --- Summary: Add Application duplicate APP_ACCEPTED events can be prevented with a flag Key: YARN-2196 URL: https://issues.apache.org/jira/browse/YARN-2196 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Anubhav Dhoot YARN-1365 adds a flag to AddApplicationAttemptSchedulerEvent that prevents a duplicate ATTEMPT_ADDED event in recovery. We can do something similar to AddApplicationSchedulerEvent to avoid the following transition. {code} // ACCECPTED state can once again receive APP_ACCEPTED event, because on // recovery the app returns ACCEPTED state and the app once again go // through the scheduler and triggers one more APP_ACCEPTED event at // ACCEPTED state. .addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED, RMAppEventType.APP_ACCEPTED) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041433#comment-14041433 ] Hadoop QA commented on YARN-1365: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652074/YARN-1365.008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4056//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4056//console This message is automatically generated. ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041435#comment-14041435 ] Mayank Bansal commented on YARN-2022: - Hi [~vinodkv] Is this ok with you if we commit this patch? As you have concerns before. I think we need to still avoid killing AM's even if we have patch for not killing applications if AM gets killed. Please suggest. Thanks, Mayank Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, YARN-2022.7.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2078) yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented
[ https://issues.apache.org/jira/browse/YARN-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2078: - Description: We should document the condition when uber mode is enabled. Currently, users need to read following code to understand the condition. {code} boolean smallMemory = ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0), conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0)) = sysMemSizeForUberSlot) || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT)); boolean smallCpu = Math.max( conf.getInt( MRJobConfig.MAP_CPU_VCORES, MRJobConfig.DEFAULT_MAP_CPU_VCORES), conf.getInt( MRJobConfig.REDUCE_CPU_VCORES, MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) = sysCPUSizeForUberSlot {code} was: We should document the condition when uber mode is enabled. If not, users need to read code. {code} boolean smallMemory = ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0), conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0)) = sysMemSizeForUberSlot) || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT)); boolean smallCpu = Math.max( conf.getInt( MRJobConfig.MAP_CPU_VCORES, MRJobConfig.DEFAULT_MAP_CPU_VCORES), conf.getInt( MRJobConfig.REDUCE_CPU_VCORES, MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) = sysCPUSizeForUberSlot {code} yarn.app.am.resource.mb/cpu-vcores affects uber mode but is not documented -- Key: YARN-2078 URL: https://issues.apache.org/jira/browse/YARN-2078 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.4.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Trivial Attachments: YARN-2078.1.patch We should document the condition when uber mode is enabled. Currently, users need to read following code to understand the condition. {code} boolean smallMemory = ( (Math.max(conf.getLong(MRJobConfig.MAP_MEMORY_MB, 0), conf.getLong(MRJobConfig.REDUCE_MEMORY_MB, 0)) = sysMemSizeForUberSlot) || (sysMemSizeForUberSlot == JobConf.DISABLED_MEMORY_LIMIT)); boolean smallCpu = Math.max( conf.getInt( MRJobConfig.MAP_CPU_VCORES, MRJobConfig.DEFAULT_MAP_CPU_VCORES), conf.getInt( MRJobConfig.REDUCE_CPU_VCORES, MRJobConfig.DEFAULT_REDUCE_CPU_VCORES)) = sysCPUSizeForUberSlot {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041494#comment-14041494 ] Tsuyoshi OZAWA commented on YARN-2052: -- Jian, thank you for clarifying. I'm working to address the comments. Please wait a moment. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041506#comment-14041506 ] Jian He commented on YARN-1365: --- we can revert RMAppImpl changes also ? ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc
Akira AJISAKA created YARN-2197: --- Summary: Add a link to YARN CHANGES.txt in the left side of doc Key: YARN-2197 URL: https://issues.apache.org/jira/browse/YARN-2197 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Priority: Minor Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left side of the document (hadoop-project/src/site/site.xml), but YARN does not exist. {code} item name=Common CHANGES.txt href=hadoop-project-dist/hadoop-common/CHANGES.txt/ item name=HDFS CHANGES.txt href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/ item name=MapReduce CHANGES.txt href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/ item name=Metrics href=hadoop-project-dist/hadoop-common/Metrics.html/ {code} A link to YARN CHANGES.txt should be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041543#comment-14041543 ] Wangda Tan commented on YARN-2144: -- Thanks Jian for review! Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041537#comment-14041537 ] Tsuyoshi OZAWA commented on YARN-2052: -- Brief design is as follows: 1. Adding getter method for epoch like {{getEpoch}} to RMContext. 2. Adding {{loadEpoch}} to RMStateStore and set the epoch value to RMContext in {{ResourceManager#serviceStart}}. One discussion point is how to serialize the epoch. Can we add epoch definition to yarn_server_resourcemanager_service_protos.proto? [~jianhe], what do you think? ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041540#comment-14041540 ] Hudson commented on YARN-2191: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5756 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5756/]) YARN-2191. Added a new test to ensure NM will clean up completed applications in the case of RM restart. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604949) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2191) Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed
[ https://issues.apache.org/jira/browse/YARN-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041544#comment-14041544 ] Wangda Tan commented on YARN-2191: -- Thanks for Jian's review and commit! Add a test to make sure NM will do application cleanup even if RM restarting happens before application completed - Key: YARN-2191 URL: https://issues.apache.org/jira/browse/YARN-2191 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2191.patch, YARN-2191.patch, YARN-2191.patch, YARN-2191.patch In YARN-1885, there's a test in TestApplicationCleanup#testAppCleanupWhenRestartedAfterAppFinished. However, we need one more test to make sure NM will do app cleanup when restart happens before app finished. The sequence is, 1. Submit app1 to RM1 2. NM1 launches app1's AM (container-0), NM2 launches app1's task containers. 3. Restart RM1 4. Before RM1 finishes restarting, container-0 completed in NM1 5. RM1 finishes restarting, NM1 report container-0 completed, app1 will be completed 6. RM1 should be able to notify NM1/NM2 to cleanup app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041548#comment-14041548 ] Jian He commented on YARN-2052: --- sounds good ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1365: Attachment: YARN-1365.009.patch Without the RMAppImpl changes ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc
[ https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2197: Attachment: YARN-2197.patch Attaching a patch. Add a link to YARN CHANGES.txt in the left side of doc -- Key: YARN-2197 URL: https://issues.apache.org/jira/browse/YARN-2197 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Priority: Minor Labels: newbie Attachments: YARN-2197.patch Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left side of the document (hadoop-project/src/site/site.xml), but YARN does not exist. {code} item name=Common CHANGES.txt href=hadoop-project-dist/hadoop-common/CHANGES.txt/ item name=HDFS CHANGES.txt href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/ item name=MapReduce CHANGES.txt href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/ item name=Metrics href=hadoop-project-dist/hadoop-common/Metrics.html/ {code} A link to YARN CHANGES.txt should be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041565#comment-14041565 ] Tsuyoshi OZAWA commented on YARN-2052: -- OK! ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc
[ https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041587#comment-14041587 ] Hadoop QA commented on YARN-2197: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652107/YARN-2197.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4058//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4058//console This message is automatically generated. Add a link to YARN CHANGES.txt in the left side of doc -- Key: YARN-2197 URL: https://issues.apache.org/jira/browse/YARN-2197 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: YARN-2197.patch Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left side of the document (hadoop-project/src/site/site.xml), but YARN does not exist. {code} item name=Common CHANGES.txt href=hadoop-project-dist/hadoop-common/CHANGES.txt/ item name=HDFS CHANGES.txt href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/ item name=MapReduce CHANGES.txt href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/ item name=Metrics href=hadoop-project-dist/hadoop-common/Metrics.html/ {code} A link to YARN CHANGES.txt should be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041600#comment-14041600 ] Hadoop QA commented on YARN-1365: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652104/YARN-1365.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4057//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4057//console This message is automatically generated. ApplicationMasterService to allow Register and Unregister of an app that was running before restart --- Key: YARN-1365 URL: https://issues.apache.org/jira/browse/YARN-1365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1365.001.patch, YARN-1365.002.patch, YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.007.patch, YARN-1365.008.patch, YARN-1365.008.patch, YARN-1365.009.patch, YARN-1365.initial.patch For an application that was running before restart, the ApplicationMasterService currently throws an exception when the app tries to make the initial register or final unregister call. These should succeed and the RMApp state machine should transition to completed like normal. Unregistration should succeed for an app that the RM considers complete since the RM may have died after saving completion in the store but before notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned YARN-2069: --- Assignee: Mayank Bansal (was: Vinod Kumar Vavilapalli) Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041617#comment-14041617 ] Mayank Bansal commented on YARN-2069: - Taking it over. Thanks, Mayank Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2069: Attachment: YARN-2069-trunk-1.patch Attaching patch Thanks, Mayank Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041650#comment-14041650 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652121/YARN-2069-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4059//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4059//console This message is automatically generated. Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041652#comment-14041652 ] Vinod Kumar Vavilapalli commented on YARN-1039: --- I am not against a container/resource level definition of whether that container is long lived or not, but I think it is equally important to mark at the application level if _at least_ one container in the application is considered long lived. So, to summarize, how about - an app-level isLongRunning() that indicates _if at least one container of this application will be long-running_ and - a resource-request level isLongRunning() that indicates _if this container is long running or not_. The app-level flag can help UIs, making very quick scheduling distinctions etc. Thoughts? Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1773) ShuffleHeader should have a format that can inform about errors
[ https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated YARN-1773: -- Target Version/s: 2.5.0 (was: 2.4.0) Affects Version/s: 2.4.0 ShuffleHeader should have a format that can inform about errors --- Key: YARN-1773 URL: https://issues.apache.org/jira/browse/YARN-1773 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0, 2.4.0 Reporter: Bikas Saha Priority: Critical Currently, the ShuffleHeader (which is a Writable) simply tries to read the successful header (mapid, reduceid etc). If there is an error then the input will have an error message instead of (mapid, reducedid etc). Thus parsing the ShuffleHeader fails and since we dont know where the error message ends, we cannot consume the remaining input stream which may have good data from the remaining map outputs. Being able to encode the error in the ShuffleHeader will let us parse out the error correctly and move on to the remaining data. The shuffle handler response should say which maps are in error and which are fine, what the error was for the erroneous maps. These will help report diagnostics for easier upstream reporting. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9
[ https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041684#comment-14041684 ] Hadoop QA commented on YARN-1327: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609276/nodemgr-portability.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4060//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4060//console This message is automatically generated. Fix nodemgr native compilation problems on FreeBSD9 --- Key: YARN-1327 URL: https://issues.apache.org/jira/browse/YARN-1327 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Fix For: 3.0.0, 2.5.0 Attachments: nodemgr-portability.txt There are several portability problems preventing from compiling native component on freebsd. 1. libgen.h is not included. correct function prototype is there but linux glibc has workaround to define it for user if libgen.h is not directly included. Include this file directly. 2. query max size of login name using sysconf. it follows same code style like rest of code using sysconf too. 3. cgroups are linux only feature, make conditional compile and return error if mount_cgroup is attempted on non linux OS 4. do not use posix function setpgrp() since it clashes with same function from BSD 4.2, use equivalent function. After inspecting glibc sources its just shortcut to setpgid(0,0) These changes makes it compile on both linux and freebsd. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041690#comment-14041690 ] Sunil G commented on YARN-2069: --- Hi [~mayank_bansal] I guess this patch also has the code changes of YARN-2022. I think this can be separated. Add cross-user preemption within CapacityScheduler's leaf-queue --- Key: YARN-2069 URL: https://issues.apache.org/jira/browse/YARN-2069 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Attachments: YARN-2069-trunk-1.patch Preemption today only works across queues and moves around resources across queues per demand and usage. We should also have user-level preemption within a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2193) Job history UI value are wrongly rendered
[ https://issues.apache.org/jira/browse/YARN-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041700#comment-14041700 ] Ashutosh Jindal commented on YARN-2193: --- During applicationMaster start up, JobHistoryEventHandler initializes the writer. This is one time initialization. If this fails because of NN problem, then none of the events are written. In this issue, because of NN safemode, writer is not initialized. Only Job_Finished event is written. At historyserver, it parse the jhist file. Job_Finished event does not contain all the fields. So, some of the field are missed. Job history UI value are wrongly rendered - Key: YARN-2193 URL: https://issues.apache.org/jira/browse/YARN-2193 Project: Hadoop YARN Issue Type: Bug Reporter: Ashutosh Jindal Attachments: issue.jpg Job history UI value are wrongly rendered because some fields are missing in jhist file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-1480: -- Attachment: YARN-1480-6.patch Thank you for reviewing, [~zjshen]. Attached an updated patch. - Added tags option as appTags. - The queue option is also available as an application filter in this patch. - Removed local filter and changed to use ApplicationClientProtocol#getApplications via YarnClient. Only finalStatus filter is leaved because of unsupported operation. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1775) Create SMAPBasedProcessTree to get PSS information
[ https://issues.apache.org/jira/browse/YARN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041740#comment-14041740 ] Steven Wong commented on YARN-1775: --- [~rajesh.balamohan], can you explain why you want to exclude 'the read only shared memory mappings in the process (i.e r--s, r-xs)'? Thanks. Create SMAPBasedProcessTree to get PSS information -- Key: YARN-1775 URL: https://issues.apache.org/jira/browse/YARN-1775 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Fix For: 2.4.0 Attachments: YARN-1775-v2.patch, YARN-1775-v3.patch, YARN-1775-v3.patch, YARN-1775-v4.patch, YARN-1775-v5.patch, yarn-1775-2.4.0.patch Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage. -- This message was sent by Atlassian JIRA (v6.2#6252)