[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932908#comment-13932908 ] Hadoop QA commented on YARN-1591: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634353/YARN-1591.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3343//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3343//console This message is automatically generated. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-90: -- Attachment: apache-yarn-90.2.patch Fixed issue that caused the patch application to fail. NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Assignee: Varun Vasudev Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932949#comment-13932949 ] Hadoop QA commented on YARN-1389: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634349/YARN-1389.11.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3342//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3342//console This message is automatically generated. ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932950#comment-13932950 ] Zhijie Shen commented on YARN-1389: --- Test failures are not related. ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932957#comment-13932957 ] Zhijie Shen commented on YARN-1577: --- Hi, Naren. YARN-1389 is resolved you can make use of ApplicationClientProtocol#getApplicationAttemptReport to get the attempt state. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932960#comment-13932960 ] Hudson commented on YARN-1389: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5316 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5316/]) YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank Bansal. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.4.0 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932976#comment-13932976 ] Hadoop QA commented on YARN-90: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634358/apache-yarn-90.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3344//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3344//console This message is automatically generated. NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Assignee: Varun Vasudev Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, apache-yarn-90.2.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1705) Cluster metrics are off after failover
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1705: - Attachment: YARN-1705.1.patch Hi, I attached patch that handles 1. transtion Active-StandBy-Active. Basically, clearing off cache(cluster metrics and queue metrics). I remain with one open point that should cluster metrics take care of recovered applications(Finished,Killed and Failed)? :-( Please give your suggestions. Cluster metrics are off after failover -- Key: YARN-1705 URL: https://issues.apache.org/jira/browse/YARN-1705 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: YARN-1705.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration
PengZhang created YARN-1829: --- Summary: CapacityScheduler can't schedule job after misconfiguration Key: YARN-1829 URL: https://issues.apache.org/jira/browse/YARN-1829 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: PengZhang CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration
[ https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengZhang updated YARN-1829: Description: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B nbspnbsp \ nbspnbsp X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) was: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) CapacityScheduler can't schedule job after misconfiguration --- Key: YARN-1829 URL: https://issues.apache.org/jira/browse/YARN-1829 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: PengZhang CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B nbspnbsp \ nbspnbsp X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration
[ https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengZhang updated YARN-1829: Description: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) was: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B nbspnbsp \ nbspnbsp X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) CapacityScheduler can't schedule job after misconfiguration --- Key: YARN-1829 URL: https://issues.apache.org/jira/browse/YARN-1829 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: PengZhang CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration
[ https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengZhang updated YARN-1829: Description: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. {code} root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) {code} was: CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) CapacityScheduler can't schedule job after misconfiguration --- Key: YARN-1829 URL: https://issues.apache.org/jira/browse/YARN-1829 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: PengZhang CapacityScheduler will validate new configuration to make sure all existing queues are still present. But it seems not enough: 1.When we change one queue(name A) from leaf to parent, it will pass validation and add it's new child(X) to queues. And later root.reinitialize() will fail because of queue type has changed. 2.Then we add new parent queue(name B) with children(X), and change queue(A)'s state to STOPPED. This will apply successfully. but job submitted to queue(X) can never be scheduled. Because LeafQueue(X) has already been added in phase 1, and it's parent points to A which is STOPPED. {code} root / A queues: root, A root / A / X reinitialize failed, but X is added to queues queues: root, A, X root / \ A B \ X new node X will not replace old one queues: root, A, X(value is not LeafQueue that in the tree) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster
[ https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933059#comment-13933059 ] Steve Loughran commented on YARN-1824: -- My main point was : client code should not have to know the value of {{yarn.application.classpath}} as it is something that YARN itself knows, and which a client can only get wrong to quote the Distributed Shell client {code} // At some point we should not be required to add // the hadoop specific classpaths to the env. // It should be provided out of the box. // For now setting all required classpaths including // the classpath to . for the application jar {code} If there was an env variable YARN_APPLICATION_LIB which you could use when setting up a classpath, most of the pain in setting up a yarn AM classpath would be avoided Make Windows client work with Linux/Unix cluster Key: YARN-1824 URL: https://issues.apache.org/jira/browse/YARN-1824 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jian He Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1824.1.patch, YARN-1824.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1542) Add unit test for public resource on viewfs
[ https://issues.apache.org/jira/browse/YARN-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933095#comment-13933095 ] Hadoop QA commented on YARN-1542: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631699/YARN-1542.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.util.TestFSDownload {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3345//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3345//console This message is automatically generated. Add unit test for public resource on viewfs --- Key: YARN-1542 URL: https://issues.apache.org/jira/browse/YARN-1542 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1542.v01.patch, YARN-1542.v02.patch, YARN-1542.v03.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts
[ https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933106#comment-13933106 ] Hudson commented on YARN-1812: -- FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/508/]) YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and thus recover app itself synchronously and avoid races with resyncing NodeManagers. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java Job stays in PREP state for long time after RM Restarts --- Key: YARN-1812 URL: https://issues.apache.org/jira/browse/YARN-1812 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch Steps followed: 1) start a sort job with 80 maps and 5 reducers 2) restart Resource manager when 60 maps and 0 reducers are finished 3) Wait for job to come out of PREP state. The job does not come out of PREP state after 7-8 mins. After waiting for 7-8 mins, test kills the job. However, Sort job should not take this long time to come out of PREP state -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart
[ https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933109#comment-13933109 ] Hudson commented on YARN-1816: -- FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/508/]) YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED event at ACCEPTED state that can happen after RM restarts. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java Succeeded application remains in accepted after RM restart -- Key: YARN-1816 URL: https://issues.apache.org/jira/browse/YARN-1816 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1816.1.patch {code} 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL
[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name
[ https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933103#comment-13933103 ] Hudson commented on YARN-1789: -- FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/508/]) YARN-1789. ApplicationSummary does not escape newlines in the app name. Contributed by Tsuyoshi OZAWA (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java ApplicationSummary does not escape newlines in the app name --- Key: YARN-1789 URL: https://issues.apache.org/jira/browse/YARN-1789 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1789.1.patch YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request
[ https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933102#comment-13933102 ] Hudson commented on YARN-1444: -- FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/508/]) YARN-1444. Fix CapacityScheduler to deal with cases where applications specify host/rack requests without off-switch request. Contributed by Wangda Tan. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java RM crashes when node resource request sent without corresponding off-switch request --- Key: YARN-1444 URL: https://issues.apache.org/jira/browse/YARN-1444 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Robert Grandl Assignee: Wangda Tan Priority: Blocker Fix For: 2.4.0 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch I have tried to force reducers to execute on certain nodes. What I did is I changed for reduce tasks, the RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, req.capability) to RMContainerRequestor#addResourceRequest(req.priority, HOST_NAME, req.capability). However, this change lead to RM crashes when reducers needs to be assigned with the following exception: FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933111#comment-13933111 ] Hudson commented on YARN-1389: -- FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/508/]) YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank Bansal. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.4.0 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart
[ https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933247#comment-13933247 ] Hudson commented on YARN-1816: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/]) YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED event at ACCEPTED state that can happen after RM restarts. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java Succeeded application remains in accepted after RM restart -- Key: YARN-1816 URL: https://issues.apache.org/jira/browse/YARN-1816 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1816.1.patch {code} 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL
[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name
[ https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933241#comment-13933241 ] Hudson commented on YARN-1789: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/]) YARN-1789. ApplicationSummary does not escape newlines in the app name. Contributed by Tsuyoshi OZAWA (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java ApplicationSummary does not escape newlines in the app name --- Key: YARN-1789 URL: https://issues.apache.org/jira/browse/YARN-1789 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1789.1.patch YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts
[ https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933244#comment-13933244 ] Hudson commented on YARN-1812: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/]) YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and thus recover app itself synchronously and avoid races with resyncing NodeManagers. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java Job stays in PREP state for long time after RM Restarts --- Key: YARN-1812 URL: https://issues.apache.org/jira/browse/YARN-1812 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch Steps followed: 1) start a sort job with 80 maps and 5 reducers 2) restart Resource manager when 60 maps and 0 reducers are finished 3) Wait for job to come out of PREP state. The job does not come out of PREP state after 7-8 mins. After waiting for 7-8 mins, test kills the job. However, Sort job should not take this long time to come out of PREP state -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request
[ https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933240#comment-13933240 ] Hudson commented on YARN-1444: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/]) YARN-1444. Fix CapacityScheduler to deal with cases where applications specify host/rack requests without off-switch request. Contributed by Wangda Tan. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java RM crashes when node resource request sent without corresponding off-switch request --- Key: YARN-1444 URL: https://issues.apache.org/jira/browse/YARN-1444 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Robert Grandl Assignee: Wangda Tan Priority: Blocker Fix For: 2.4.0 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch I have tried to force reducers to execute on certain nodes. What I did is I changed for reduce tasks, the RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, req.capability) to RMContainerRequestor#addResourceRequest(req.priority, HOST_NAME, req.capability). However, this change lead to RM crashes when reducers needs to be assigned with the following exception: FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1591: - Attachment: YARN-1591.3.patch Fixed not to throw YarnRuntimeException when InterruptedException is thrown in EventDispatcher#handle. IIUC, throwing YarnRuntimeException in EventDispatcher#handle is not handled in AsyncDispatcher and leads needless crash. We should exit the thread gracefully in that case. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request
[ https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933341#comment-13933341 ] Hudson commented on YARN-1444: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/]) YARN-1444. Fix CapacityScheduler to deal with cases where applications specify host/rack requests without off-switch request. Contributed by Wangda Tan. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java RM crashes when node resource request sent without corresponding off-switch request --- Key: YARN-1444 URL: https://issues.apache.org/jira/browse/YARN-1444 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Robert Grandl Assignee: Wangda Tan Priority: Blocker Fix For: 2.4.0 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch I have tried to force reducers to execute on certain nodes. What I did is I changed for reduce tasks, the RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, req.capability) to RMContainerRequestor#addResourceRequest(req.priority, HOST_NAME, req.capability). However, this change lead to RM crashes when reducers needs to be assigned with the following exception: FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart
[ https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933348#comment-13933348 ] Hudson commented on YARN-1816: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/]) YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED event at ACCEPTED state that can happen after RM restarts. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java Succeeded application remains in accepted after RM restart -- Key: YARN-1816 URL: https://issues.apache.org/jira/browse/YARN-1816 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1816.1.patch {code} 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCEhrt_qa defaultACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933355#comment-13933355 ] Hadoop QA commented on YARN-1591: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634436/YARN-1591.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3346//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3346//console This message is automatically generated. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts
[ https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933345#comment-13933345 ] Hudson commented on YARN-1812: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/]) YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and thus recover app itself synchronously and avoid races with resyncing NodeManagers. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java Job stays in PREP state for long time after RM Restarts --- Key: YARN-1812 URL: https://issues.apache.org/jira/browse/YARN-1812 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch Steps followed: 1) start a sort job with 80 maps and 5 reducers 2) restart Resource manager when 60 maps and 0 reducers are finished 3) Wait for job to come out of PREP state. The job does not come out of PREP state after 7-8 mins. After waiting for 7-8 mins, test kills the job. However, Sort job should not take this long time to come out of PREP state -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name
[ https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933342#comment-13933342 ] Hudson commented on YARN-1789: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/]) YARN-1789. ApplicationSummary does not escape newlines in the app name. Contributed by Tsuyoshi OZAWA (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java ApplicationSummary does not escape newlines in the app name --- Key: YARN-1789 URL: https://issues.apache.org/jira/browse/YARN-1789 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Akira AJISAKA Assignee: Tsuyoshi OZAWA Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: YARN-1789.1.patch YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933350#comment-13933350 ] Hudson commented on YARN-1389: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/]) YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank Bansal. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: 2.4.0 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats
[ https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1512: Attachment: YARN-1512.patch Updated, ready I believe. Enhance CS to decouple scheduling from node heartbeats -- Key: YARN-1512 URL: https://issues.apache.org/jira/browse/YARN-1512 Project: Hadoop YARN Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats
[ https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933449#comment-13933449 ] Hadoop QA commented on YARN-1512: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634446/YARN-1512.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3347//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3347//console This message is automatically generated. Enhance CS to decouple scheduling from node heartbeats -- Key: YARN-1512 URL: https://issues.apache.org/jira/browse/YARN-1512 Project: Hadoop YARN Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933478#comment-13933478 ] Jonathan Eagles commented on YARN-1769: --- Hi, Tom. Can you comment on the findbugs warnings that are introduced as part of this patch when you get a chance? CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933541#comment-13933541 ] Naren Koneru commented on YARN-1577: Hi Zhijie, Nice, thanks for letting me know. I will use that in llama and also submit a patch for yarn unmanagedamlauncher later today. regards Naren Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1830) TestRMRestart.testQueueMetricsOnRMRestart failure
Karthik Kambatla created YARN-1830: -- Summary: TestRMRestart.testQueueMetricsOnRMRestart failure Key: YARN-1830 URL: https://issues.apache.org/jira/browse/YARN-1830 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows (reported on YARN-1815): {noformat} java.lang.AssertionError: expected:37 but was:38 ... at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933580#comment-13933580 ] Karthik Kambatla commented on YARN-1815: The tests pass locally. Filed YARN-1830 for TestRMRestart failure and YARN-1591 covers TestRMRestart failure. [~vinodkv] - mind taking a look at the updated patch? RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1591: Assignee: Tsuyoshi OZAWA (was: Vinod Kumar Vavilapalli) TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1830) TestRMRestart.testQueueMetricsOnRMRestart failure
[ https://issues.apache.org/jira/browse/YARN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933586#comment-13933586 ] Zhijie Shen commented on YARN-1830: --- See the same failure reported on YARN-1389 TestRMRestart.testQueueMetricsOnRMRestart failure - Key: YARN-1830 URL: https://issues.apache.org/jira/browse/YARN-1830 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows (reported on YARN-1815): {noformat} java.lang.AssertionError: expected:37 but was:38 ... at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats
[ https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933582#comment-13933582 ] Arun C Murthy commented on YARN-1512: - Looks like YARN-1591 tracks the failure with TestResourceTrackerService. Enhance CS to decouple scheduling from node heartbeats -- Key: YARN-1512 URL: https://issues.apache.org/jira/browse/YARN-1512 Project: Hadoop YARN Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens
[ https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-1795: Description: Running the Oozie unit tests against a Hadoop build with YARN-713 causes many of the tests to be flakey. Doing some digging, I found that they were failing because some of the MR jobs were failing; I found this in the syslog of the failed jobs: {noformat} 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394064846476_0013_m_00_0: Container launch failed for container_1394064846476_0013_01_03 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for 192.168.1.77:50759 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} I did some debugging and found that the NMTokenCache has a different port number than what's being looked up. For example, the NMTokenCache had one token with address 192.168.1.77:58217 but ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. The 58213 address comes from ContainerLauncherImpl's constructor. So when the Container is being launched it somehow has a different port than when the token was created. Any ideas why the port numbers wouldn't match? Update: This also happens in an actual cluster, not just Oozie's unit tests was: Running the Oozie unit tests against a Hadoop build with YARN-713 causes many of the tests to be flakey. Doing some digging, I found that they were failing because some of the MR jobs were failing; I found this in the syslog of the failed jobs: {noformat} 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394064846476_0013_m_00_0: Container launch failed for container_1394064846476_0013_01_03 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for 192.168.1.77:50759 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} I did some debugging and found that the NMTokenCache has a different port number than what's being looked up. For example, the NMTokenCache had one token with address 192.168.1.77:58217 but ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. The 58213 address comes from ContainerLauncherImpl's constructor. So when the Container is being launched it somehow has a different port than when the token was created. Any ideas why the port numbers wouldn't match? Summary: After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens (was: Oozie tests are flakey after YARN-713) We've now seen this problem in an actual cluster, not just Oozie's unit tests; so this is definitely a problem and not something funny we're
[jira] [Updated] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens
[ https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1795: --- Priority: Blocker (was: Critical) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens Key: YARN-1795 URL: https://issues.apache.org/jira/browse/YARN-1795 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Robert Kanter Priority: Blocker Attachments: org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog Running the Oozie unit tests against a Hadoop build with YARN-713 causes many of the tests to be flakey. Doing some digging, I found that they were failing because some of the MR jobs were failing; I found this in the syslog of the failed jobs: {noformat} 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394064846476_0013_m_00_0: Container launch failed for container_1394064846476_0013_01_03 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for 192.168.1.77:50759 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} I did some debugging and found that the NMTokenCache has a different port number than what's being looked up. For example, the NMTokenCache had one token with address 192.168.1.77:58217 but ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. The 58213 address comes from ContainerLauncherImpl's constructor. So when the Container is being launched it somehow has a different port than when the token was created. Any ideas why the port numbers wouldn't match? Update: This also happens in an actual cluster, not just Oozie's unit tests -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933637#comment-13933637 ] Hadoop QA commented on YARN-1811: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634241/YARN-1811.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3348//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3348//console This message is automatically generated. RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933639#comment-13933639 ] Jian He commented on YARN-1815: --- Thanks Karthik for the patch. For now, it should be fine to move UMA to Failed state as UMA is not saving the final state and RM restart doesn’t support UMA. The core change looks good. Test case: we need a more thorough test case to test UMA is moved to Failed state after RM restarts using two MockRMs like the ones in TestRMRestart. The bigger problem is that if Unmanged application is not added back to the completedApps in RMAppManager after RM restart via the FinalTransition, it'll never be removed from state store. We remove the applications from state store when completedApps in RMAppManager go beyond the max-app-limit. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933820#comment-13933820 ] Tsuyoshi OZAWA commented on YARN-1591: -- test failures looks not related. [~jianhe], can you take a look? TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933825#comment-13933825 ] Robert Kanter commented on YARN-1811: - TestResourceTrackerService is flakey (and fails without the patch): YARN-1591 RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933831#comment-13933831 ] Thomas Graves commented on YARN-1769: - The findbugs can be ignored. They are talking about inconsistent synchronization but of the class variable that is sometimes referenced inside synchronized functions and sometimes not. It doesn't matter if that variable is accessed synchronized or not. I'll add it to the excludes files. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
Yesha Vora created YARN-1831: Summary: Job should be marked as Falied if it is recovered from commit. Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora If Resource manager is restarted when a job is in commit state, The job is not able to recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch exclude findbugs warnings. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933860#comment-13933860 ] Hadoop QA commented on YARN-1769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634508/YARN-1769.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3349//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
[ https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1831: --- Assignee: Xuan Gong Job should be marked as Falied if it is recovered from commit. -- Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Xuan Gong If Resource manager is restarted when a job is in commit state, The job is not able to recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
[ https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933869#comment-13933869 ] Xuan Gong commented on YARN-1831: - Close this as duplicate Job should be marked as Falied if it is recovered from commit. -- Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Xuan Gong If Resource manager is restarted when a job is in commit state, The job is not able to be recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
[ https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-1831: - Description: If Resource manager is restarted when a job is in commit state, The job is not able to be recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. was: If Resource manager is restarted when a job is in commit state, The job is not able to recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. Job should be marked as Falied if it is recovered from commit. -- Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Xuan Gong If Resource manager is restarted when a job is in commit state, The job is not able to be recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933866#comment-13933866 ] Karthik Kambatla commented on YARN-1811: Thanks Robert. Comments: # WebAppUtils#getProxyHostsAndPortsForAmFilter is a little dense for my liking. We should probably add comments for the various ifs and fors :) # Nit: Okay with not fixing it. I find using Joiner more readable. {code} StringBuilder sb = new StringBuilder(); for (String proxy : proxies) { sb.append(proxy.split(:)[0]).append(AmIpFilter.PROXY_HOSTS_DELIMITER); } sb.setLength(sb.length() - 1); {code} # AmIpFilter has a couple of public fields we are removing. We can leave them there for compatibility sake (in theory) and may be deprecate them as well. If others involved think okay, we should probably just make AmIpFilter @Private. # AmIpFilter#findRedirectUrl - we could use a MapString (host:port), proxyUriBase, so we don't need the following for loop. {code} for (String proxyUriBase : proxyUriBases) { try { URL url = new URL(proxyUriBase); if (host.equals(url.getHost() + : + url.getPort())) { addr = proxyUriBase; break; } } catch(MalformedURLException e) { // ignore } } {code} # Also, we should at least log the MalformedURLException above and not add to the map. RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
[ https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933867#comment-13933867 ] Xuan Gong commented on YARN-1831: - create a MapReduce ticket. Let us start from there: https://issues.apache.org/jira/browse/MAPREDUCE-5795 Job should be marked as Falied if it is recovered from commit. -- Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Xuan Gong If Resource manager is restarted when a job is in commit state, The job is not able to be recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1831) Job should be marked as Falied if it is recovered from commit.
[ https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-1831. - Resolution: Duplicate Job should be marked as Falied if it is recovered from commit. -- Key: YARN-1831 URL: https://issues.apache.org/jira/browse/YARN-1831 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Assignee: Xuan Gong If Resource manager is restarted when a job is in commit state, The job is not able to be recovered after RM restart and it is marked as Killed. The job status should be Failed instead killed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933873#comment-13933873 ] Robert Kanter commented on YARN-1811: - I'll make those changes and put up a new patch. I think we should make AmIpFilter {{@Private}}; my understanding is that its meant only to be used internally by Yarn for the AM anyway. RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch upmerge patch to latest. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933938#comment-13933938 ] Jian He commented on YARN-1815: --- bq. it should be fine to move UMA to Failed state as UMA is not saving the final state On a second thought, if the UMA just successfully finished, and it will also be moved to FAILD state after RM restart? this doesn't seem right. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933943#comment-13933943 ] Hadoop QA commented on YARN-1769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634513/YARN-1769.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3350//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934005#comment-13934005 ] Jian He commented on YARN-1591: --- In the scope of this jira, the reason TestResourceTrackerService is failing because testNodeRegistrationWithContainers and testNodeRegistrationWithContainers is not stopping RM, causing cluster metrics already exists Exception, so stopping those two RMs should be enough ? btw. there's already a global variable rm to record the RM and the RM is stopped in the tearDown(), we may use that. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934027#comment-13934027 ] Zhijie Shen commented on YARN-1809: --- AHS web-UI is still not able to show tags due to YARN-1462 Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-1811: Attachment: YARN-1811.patch New patch addresses Karthik's comments. RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1822) Revisit AM link being broken for work preserving restart
[ https://issues.apache.org/jira/browse/YARN-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter resolved YARN-1822. - Resolution: Invalid YARN-1811 is being done differently, and this is no longer needed Revisit AM link being broken for work preserving restart Key: YARN-1822 URL: https://issues.apache.org/jira/browse/YARN-1822 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Robert Kanter We should revisit the issue in YARN-1811 as it may require changes once we have work-preserving restarts. Currently, the AmIpFilter is given the active RM at AM initialization/startup, so when the RM fails over and the AM is restarted, this gets recalculated properly. However, with work-preserving restart, this will now point to the inactive RM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1591: - Attachment: YARN-1591.5.patch [~jianhe] oops, I've overlooked rm is defined locally in the test cases. Than k you for pointing out. +1 on your idea. Updated to patch to use RM defined in class field. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1717: - Attachment: YARN-1717.10.patch [~zjshen], thanks for the review. I have implemented your suggestions in the attached patch, with the following notes. bq. 2. Should these aging mechanism related configs have a leveldb section in the config name? Because they're only related to the leveldb impl. I moved ttl-interval-ms to the leveldb section, but kept ttl-ms and ttl-enable in the timeline store section since I think those could be useful for all stores. bq. 5. It seems not necessary to refactor getEntity into two methods, doesn't it? Thanks for pointing this out. I was able to remove a number of changes that were only needed for the old deletion strategy. bq. 7. In discardOldEntities, if one IOException happens, is it good to move on with the following discarding operations? I added a catch for the exception, logged an error, and continued deletions for the next entity type. Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.10.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1536: --- Assignee: Anubhav Dhoot (was: Karthik Kambatla) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1809: -- Attachment: YARN-1809.2.patch Upload a new patch with the following changes: 1. Rebase against YARN-1389 2. Do more refactoring on App(s)/Attempt/Container page classes 3. Fix the bugs I've done some local test for App(s)/Attempt/Container pages of RM webUI, which is so far so good, exception some error caused by log URL, which will be handled in YARN-1685. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934155#comment-13934155 ] Hadoop QA commented on YARN-1811: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634541/YARN-1811.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3351//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3351//console This message is automatically generated. RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934165#comment-13934165 ] Zhijie Shen commented on YARN-808: -- After YARN-1389, we have separate APIs to get the application attempt report(s), where we can get the application attempt state. IMHO, we no longer need to have additional attempt state in application report. Any idea? ApplicationReport does not clearly tell that the attempt is running or not -- Key: YARN-808 URL: https://issues.apache.org/jira/browse/YARN-808 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-808.1.patch When an app attempt fails and is being retried, ApplicationReport immediately gives the new attemptId and non-null values of host etc. There is no way for clients to know that the attempt is running other than connecting to it and timing out on invalid host. Solution would be to expose the attempt state or return a null value for host instead of N/A -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934169#comment-13934169 ] Hadoop QA commented on YARN-1717: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634551/YARN-1717.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3353//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3353//console This message is automatically generated. Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.10.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934174#comment-13934174 ] Arun C Murthy commented on YARN-796: Back to this, some thoughts: * Admin interface ** Labels are specified by admins (node configuration, dynamic add/remove via rmadmin). ** Each scheduler (CS, FS) can pick how they want labels specified in their configs ** Dynamically added labels are, initially, not persisted across RM restarts. So, these need to be manually edited into capacity-scheduler.xml etc. ** By default, all nodes have a *default* label, but admins can explicitly set a list of labels and drop the *default* label. ** Queues have label ACLs i.e. admins can specify, per queue, what labels can be used by applications per queue * End-user interface ** Applications can ask for containers on nodes with specific labels as part of the RR; however, host-specific RRs with labels are illegal i.e. labels are allowed only for rack * RRs: results in InvalidResourceRequestException ** RR with a non-existent label (point in time) is illegal: results in InvalidResourceRequestException ** RR with label without appropriate ACL results in InvalidResourceRequestException (do we want a special InvalidResourceRequestACLException?) ** Initially, RRs can ask for multiple labels with the expectation that it's an AND operation Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934176#comment-13934176 ] Jian He commented on YARN-1591: --- Thanks for the patch [~ozawa] ! Patch looks good, +1 TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934174#comment-13934174 ] Arun C Murthy edited comment on YARN-796 at 3/13/14 10:00 PM: -- Back to this, some thoughts: * Admin interface ** Labels are specified by admins (node configuration, dynamic add/remove via rmadmin). ** Each scheduler (CS, FS) can pick how they want labels specified in their configs ** Dynamically added labels are, initially, not persisted across RM restarts. So, these need to be manually edited into yarn-site.xml, ACLs into capacity-scheduler.xml etc. ** By default, all nodes have a *default* label, but admins can explicitly set a list of labels and drop the *default* label. ** Queues have label ACLs i.e. admins can specify, per queue, what labels can be used by applications per queue * End-user interface ** Applications can ask for containers on nodes with specific labels as part of the RR; however, host-specific RRs with labels are illegal i.e. labels are allowed only for rack * RRs: results in InvalidResourceRequestException ** RR with a non-existent label (point in time) is illegal: results in InvalidResourceRequestException ** RR with label without appropriate ACL results in InvalidResourceRequestException (do we want a special InvalidResourceRequestACLException?) ** Initially, RRs can ask for multiple labels with the expectation that it's an AND operation was (Author: acmurthy): Back to this, some thoughts: * Admin interface ** Labels are specified by admins (node configuration, dynamic add/remove via rmadmin). ** Each scheduler (CS, FS) can pick how they want labels specified in their configs ** Dynamically added labels are, initially, not persisted across RM restarts. So, these need to be manually edited into capacity-scheduler.xml etc. ** By default, all nodes have a *default* label, but admins can explicitly set a list of labels and drop the *default* label. ** Queues have label ACLs i.e. admins can specify, per queue, what labels can be used by applications per queue * End-user interface ** Applications can ask for containers on nodes with specific labels as part of the RR; however, host-specific RRs with labels are illegal i.e. labels are allowed only for rack * RRs: results in InvalidResourceRequestException ** RR with a non-existent label (point in time) is illegal: results in InvalidResourceRequestException ** RR with label without appropriate ACL results in InvalidResourceRequestException (do we want a special InvalidResourceRequestACLException?) ** Initially, RRs can ask for multiple labels with the expectation that it's an AND operation Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934182#comment-13934182 ] Karthik Kambatla commented on YARN-1811: Changes look good to me. I ll defer the @Private on AmIpFilter to Vinod. [~vinodkv] - can you take a look at the latest patch from Robert? RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM
[ https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934181#comment-13934181 ] Robert Kanter commented on YARN-1811: - Both failures look untreated and already have JIRAs: TestResourceTrackerService (YARN-1591) and TestRMRestart (YARN-1830) RM HA: AM link broken if the AM is on nodes other than RM - Key: YARN-1811 URL: https://issues.apache.org/jira/browse/YARN-1811 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, YARN-1811.patch When using RM HA, if you click on the Application Master link in the RM web UI while the job is running, you get an Error 500: -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934189#comment-13934189 ] Sandy Ryza commented on YARN-796: - Makes a lot of sense to me. One nit: bq. Each scheduler (CS, FS) can pick how they want labels specified in their configs Correct me if I'm misunderstanding what you mean here, but currently neither scheduler has node-specific stuff in its configuration. Updating the scheduler config when a node is added or removed from the cluster seems cumbersome. Should labels not be included in the NodeManager configuration like Resources are? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934200#comment-13934200 ] Hadoop QA commented on YARN-1591: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634550/YARN-1591.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3352//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3352//console This message is automatically generated. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1536: Attachment: yarn-1536.patch Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Attachments: yarn-1536.patch Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934246#comment-13934246 ] Arun C Murthy commented on YARN-796: [~sandyr] - Sorry, if it wasn't clear. I meant the ACLs for labels should be specified in each scheduler. So, for e.g.: {noformat} property nameyarn.scheduler.capacity.root.A.labels/name valuelabelA, labelX/value /property property nameyarn.scheduler.capacity.root.B.labels/name valuelabelB, labelY/value /property {noformat} Makes sense? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934247#comment-13934247 ] Hadoop QA commented on YARN-1809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634560/YARN-1809.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3354//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3354//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3354//console This message is automatically generated. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934245#comment-13934245 ] Tsuyoshi OZAWA commented on YARN-1591: -- [~jianhe], what do you think about the timeout? I've never found the timeout locally. Should timeout value of testGetNextHeartBeatInterval be larger? TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1809: -- Attachment: YARN-1809.3.patch Fix the findbugs Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934288#comment-13934288 ] Zhijie Shen commented on YARN-1717: --- Some minor things on the patch: 1. Rename the class to EntityDiscardThread or something? {code} + private class DeletionThread extends Thread { {code} 2. Have a warn level log here? {code} + } catch (InterruptedException ignored) { + } {code} Another arguable issue is: it is possible that the entity is expired according to its TS, while part of its events is still in TTL. We do deletion according to entity's TS and at the entity's granularity, thus, the events that are still alive are likely to be deleted as well. Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.10.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, YARN-1717.8.patch, YARN-1717.9.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934293#comment-13934293 ] Alejandro Abdelnur commented on YARN-796: - Arun, doing a recap on the config, is this what you mean? ResourceManager {{yarn-site.xml}} would specify the valid labels systemwide (you didn't suggest this, but it prevent label typos going unnoticed): {code} property nameyarn.resourcemanager.valid-labels/name valuelabelA, labelB, labelX/value /properties {code} NodeManagers yarn-site.xml would specify the labels of the node: {code} property nameyarn.nodemanager.labels/name valuelabelA, labelX/value /properties {code} Scheduler configuration, in its queue configuration would specify what labels can be used when requesting allocations in that queue: {code} property nameyarn.scheduler.capacity.root.A.allowed-labels/name valuelabelA/value /properties {code} Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934304#comment-13934304 ] Jian He commented on YARN-1591: --- Seems one more issue here... somehow the TestResourceTrackerService test suit crashes randomly. I think you found a good clue earlier bq. I found a test failure by an uncaught exception after running lots tests. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster
[ https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934330#comment-13934330 ] Vinod Kumar Vavilapalli commented on YARN-1824: --- Also call DEFAULT_YARN_APPLICATION_CLASSPATH_CROSS_ENV to be DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH ? Make Windows client work with Linux/Unix cluster Key: YARN-1824 URL: https://issues.apache.org/jira/browse/YARN-1824 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jian He Assignee: Jian He Attachments: YARN-1824.1.patch, YARN-1824.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-1771: Issue Type: Improvement (was: Bug) many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1658: --- Attachment: YARN1658.2.patch Thanks Vinod for the comment. I've changed it to reduce overriding to only to get the filter class. Upload the latest patch. Tested on a secure cluster. Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1658: --- Attachment: YARN1658.3.patch Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead
[ https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934363#comment-13934363 ] Hadoop QA commented on YARN-1536: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634571/yarn-1536.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3355//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3355//console This message is automatically generated. Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead - Key: YARN-1536 URL: https://issues.apache.org/jira/browse/YARN-1536 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Anubhav Dhoot Priority: Minor Labels: newbie Attachments: yarn-1536.patch Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934368#comment-13934368 ] Hadoop QA commented on YARN-1809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634576/YARN-1809.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3356//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3356//console This message is automatically generated. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1591: - Attachment: YARN-1591.6.patch This patch works well on local. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934396#comment-13934396 ] Hudson commented on YARN-1771: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5325 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5325/]) YARN-1771. Reduce the number of NameNode operations during localization of public resources using a cache. Contributed by Sangjin Lee (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
[ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934426#comment-13934426 ] Sangjin Lee commented on YARN-1771: --- Thanks Chris! It would be great if you could commit this to branch-2.4 too... many getFileStatus calls made from node manager for localizing a public distributed cache resource -- Key: YARN-1771 URL: https://issues.apache.org/jira/browse/YARN-1771 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, yarn-1771.patch We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1393#comment-1393 ] Hadoop QA commented on YARN-1591: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634600/YARN-1591.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3357//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3357//console This message is automatically generated. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934447#comment-13934447 ] Hadoop QA commented on YARN-1658: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634598/YARN1658.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3358//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3358//console This message is automatically generated. Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk
[ https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934482#comment-13934482 ] Tsuyoshi OZAWA commented on YARN-1591: -- Fixed not to throw YarnRuntimeException when InterruptedException is thrown in EventDispatcher#handle. IIUC, throwing YarnRuntimeException in EventDispatcher#handle is not handled in AsyncDispatcher and leads needless crash. We should exit the thread gracefully in that case. latest patch include this fix. TestResourceTrackerService fails randomly on trunk -- Key: YARN-1591 URL: https://issues.apache.org/jira/browse/YARN-1591 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Tsuyoshi OZAWA Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934487#comment-13934487 ] Vinod Kumar Vavilapalli commented on YARN-1658: --- Both failures are existing issues: TestResourceTrackerService tracked at YARN-1591 and TestRMRestart at YARN-1830. The latest patch looks good to me. +1. Checking this in. Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934519#comment-13934519 ] Hudson commented on YARN-1658: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5326 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5326/]) YARN-1658. Modified web-app framework to let standby RMs redirect web-service calls to the active RM. Contributed by Cindy Li. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577408) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Dispatcher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMDispatcher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Fix For: 2.4.0 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934522#comment-13934522 ] Junping Du commented on YARN-796: - bq. ResourceManager yarn-site.xml would specify the valid labels systemwide (you didn't suggest this, but it prevent label typos going unnoticed): I don't think typo of label is a big issue. Restricting labels in RM side potentially prevent to add new label for new application on new registering nodes as we don't have things to refresh yarn-site config dynamically. Isn't it? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)