[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919153#comment-13919153 ] Sunil G commented on YARN-1769: --- Hi Thomas CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919154#comment-13919154 ] Sunil G commented on YARN-1769: --- Hi Thomas CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1768) yarn kill non-existent application is too verbose
[ https://issues.apache.org/jira/browse/YARN-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919157#comment-13919157 ] Ravi Prakash commented on YARN-1768: Thanks Tsuyoshi! Patch lgtm. +1. I'll commit it tomorrow in trunk and branch-2 unless any one has a comment. yarn kill non-existent application is too verbose - Key: YARN-1768 URL: https://issues.apache.org/jira/browse/YARN-1768 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1768.1.patch, YARN-1768.2.patch, YARN-1768.3.patch Instead of catching ApplicationNotFound and logging a simple app not found message, the whole stack trace is logged. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919161#comment-13919161 ] Sunil G commented on YARN-1769: --- Hi Thomas In LeafQueue assignContainer call, reserve() call will happen from the else logic. Here there is a check as below if ((!scheduler.getConfiguration().getReservationContinueLook()) || (canAllocContainer) || (rmContainer != null)) { Is there any scenrio to happen with out these 3 case. ? Only if for first time allocation, and if application cant assign a container, may be chances are less to reach here. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919170#comment-13919170 ] Mayank Bansal commented on YARN-1389: - Thanks [~zjshen] for the review bq. 1. ApplicationClientProtocol and ApplicationHistoryProtocol are able to share a base interface now? I think we decided we will keep the interfaces seprate. bq. 2. Javadoc in ApplicationHistoryProtocol says the data is obtained from AHS, which is not correct. Done bq. 3. YarnClientImpl misses the implementation for getting attempts/container/containers Done bq. 4. Users are not able to get completed application list via YarnClient Done bq. 5. Like RMApp, make createApplicationAttemptReport/ContainerReport as part of RMAppAttempt/RMContainer. These are just utility functions, do you think they are needed in RMAPPATtempt and RMContainer? Updating the latest patch. Thanks, Mayank ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1389: Attachment: YARN-1389-3.patch ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
[ https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919254#comment-13919254 ] Hudson commented on YARN-1748: -- FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/499/]) YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml hadoop-yarn-server-tests packages core-site.xml breaking downstream tests - Key: YARN-1748 URL: https://issues.apache.org/jira/browse/YARN-1748 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sravya Tirukkovalur Assignee: Sravya Tirukkovalur Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1748-1.patch, YARN-1748-1.patch Jars should not package config files, as this might come into the classpaths of clients causing the clients to break. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
[ https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919245#comment-13919245 ] Hudson commented on YARN-1765: -- FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/499/]) YARN-1765. Added test cases to verify that killApplication API works across ResourceManager failover. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Write test cases to verify that killApplication API works in RM HA -- Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, YARN-1765.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919250#comment-13919250 ] Hudson commented on YARN-1729: -- FAILURE: Integrated in Hadoop-Yarn-trunk #499 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/499/]) YARN-1729. Made TimelineWebServices deserialize the string primary- and secondary-filters param into the JSON-compatible object. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java TimelineWebServices always passes primary and secondary filters as strings -- Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1206) Container logs link is broken on RM web UI after application finished
[ https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1206: Assignee: Rohith Container logs link is broken on RM web UI after application finished - Key: YARN-1206 URL: https://issues.apache.org/jira/browse/YARN-1206 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith Priority: Blocker With log aggregation disabled, when container is running, its logs link works properly, but after the application is finished, the link shows 'Container does not exist.' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1206) Container logs link is broken on RM web UI after application finished
[ https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1206: - Attachment: YARN-1206.patch Attaching patch for fixing this issue. Please review Container logs link is broken on RM web UI after application finished - Key: YARN-1206 URL: https://issues.apache.org/jira/browse/YARN-1206 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith Priority: Blocker Attachments: YARN-1206.patch With log aggregation disabled, when container is running, its logs link works properly, but after the application is finished, the link shows 'Container does not exist.' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
[ https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919280#comment-13919280 ] Hudson commented on YARN-1765: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1765. Added test cases to verify that killApplication API works across ResourceManager failover. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Write test cases to verify that killApplication API works in RM HA -- Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, YARN-1765.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
[ https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919289#comment-13919289 ] Hudson commented on YARN-1748: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml hadoop-yarn-server-tests packages core-site.xml breaking downstream tests - Key: YARN-1748 URL: https://issues.apache.org/jira/browse/YARN-1748 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sravya Tirukkovalur Assignee: Sravya Tirukkovalur Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1748-1.patch, YARN-1748-1.patch Jars should not package config files, as this might come into the classpaths of clients causing the clients to break. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1675) Application does not change to RUNNING after being scheduled
[ https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919281#comment-13919281 ] Hudson commented on YARN-1675: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1675. Added the previously missed new file. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573736) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestKillApplicationWithRMHA.java Application does not change to RUNNING after being scheduled Key: YARN-1675 URL: https://issues.apache.org/jira/browse/YARN-1675 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Trupti Dhavle I dont see any stacktraces in logs. But the debug logs show negative vcores- {noformat} 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(808)) - assignContainers: node=hor11n39.gq1.ygridcore.net #applications=5 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0269 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 currentConsumption=2048 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0269 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application application_1390986573180_0269 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 currentConsumption=2048 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0269 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0272 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0272 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application application_1390986573180_0272 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0272 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0273 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0273 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application
[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919285#comment-13919285 ] Hudson commented on YARN-1729: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1729. Made TimelineWebServices deserialize the string primary- and secondary-filters param into the JSON-compatible object. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java TimelineWebServices always passes primary and secondary filters as strings -- Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919288#comment-13919288 ] Hudson commented on YARN-1758: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1758. Fixed ResourceManager to not mandate the presence of site specific configuration files and thus fix failures in downstream tests. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573695) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/FileSystemBasedConfigurationProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1758.1.patch, YARN-1758.2.patch NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1704) Review LICENSE and NOTICE to reflect new levelDB releated libraries being used
[ https://issues.apache.org/jira/browse/YARN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919284#comment-13919284 ] Hudson commented on YARN-1704: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1691/]) YARN-1704. Modified LICENSE and NOTICE files to reflect newly used levelDB related libraries. Contributed by Billie Rinaldi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573702) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/LICENSE.txt * /hadoop/common/trunk/hadoop-yarn-project/NOTICE.txt Review LICENSE and NOTICE to reflect new levelDB releated libraries being used -- Key: YARN-1704 URL: https://issues.apache.org/jira/browse/YARN-1704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1704.1.patch, YARN-1704.2.patch, YARN-1704.3.patch Make any changes necessary in LICENSE and NOTICE related to dependencies introduced by the application timeline store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-257) NM should gracefully handle a full local disk
[ https://issues.apache.org/jira/browse/YARN-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919295#comment-13919295 ] Sunil G commented on YARN-257: -- May be NM can do some level of handling by itself in Disk Full scenario as in first place. NM's LocalDirAllocator gives a local path to write from the good list of directories. But for this, it uses a round robin algorithm based on space available. In a scenario like below, if more tasks asks for path from the set of local directories, then it is possible that the allocation is done based on the current availability at that given time. But this path would have earlier given to some other tasks to write and they may be sequentially doing writing. Basically the allotted space is not considered when next allocation is given for another task from same path. [Assuming few earlier allocated tasks is doing write at this time] But it is not possible to consider this earlier allotted space and it is not possible to predict the disk write speed. Could it be possible to predict disk full scenario rather than acting on when it happens. For Eg, current health check mechanism will check access permission etc to identify and good and bad directories for 2 minute interval. Here if the space is almost full (say 95% or only 5*100Mb is remaining), then it is better to move that directory to bad list directories. Or in the LocalDirAllocator, it is better to check for high percentage of disk used. And do not assign such a directory to that task. These measures might possible help to resolve the new tasks not to fail because of an immediate disk full scenario. NM should gracefully handle a full local disk - Key: YARN-257 URL: https://issues.apache.org/jira/browse/YARN-257 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe When a local disk becomes full, the node will fail every container launched on it because the container is unable to localize. It tries to create an app-specific directory for each local and log directories. If any of those directory creates fail (due to lack of free space) the container fails. It would be nice if the node could continue to launch containers using the space available on other disks rather than failing all containers trying to launch on the node. This is somewhat related to YARN-91 but is centered around the disk becoming full rather than the disk failing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919301#comment-13919301 ] Hadoop QA commented on YARN-1389: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632479/YARN-1389-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3239//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3239//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3239//console This message is automatically generated. ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1206) Container logs link is broken on RM web UI after application finished
[ https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919317#comment-13919317 ] Hadoop QA commented on YARN-1206: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632490/YARN-1206.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3240//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3240//console This message is automatically generated. Container logs link is broken on RM web UI after application finished - Key: YARN-1206 URL: https://issues.apache.org/jira/browse/YARN-1206 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith Priority: Blocker Attachments: YARN-1206.patch With log aggregation disabled, when container is running, its logs link works properly, but after the application is finished, the link shows 'Container does not exist.' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1445) Separate FINISHING and FINISHED state in YarnApplicationState
[ https://issues.apache.org/jira/browse/YARN-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919440#comment-13919440 ] Jason Lowe commented on YARN-1445: -- bq. Then, it is possible that AM is unregistered, and RM tells the client that the application is still running. When the client moves on to contact AM, AM has proceeded and exited before being able to respond the client request. This race will always exist and is inherent with asynchronous processes. The client could check the RM and the app could really be RUNNING, but by the time the client gets around to contacting the app the AM has rushed through the FINISHING and FINISHED state and could be gone by the time the client gets there. That's why ClientServiceDelegate retries on errors and re-evaluates whether to go to the AM or history server on each retry. Separate FINISHING and FINISHED state in YarnApplicationState - Key: YARN-1445 URL: https://issues.apache.org/jira/browse/YARN-1445 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1445.1.patch, YARN-1445.2.patch, YARN-1445.3.patch, YARN-1445.4.patch, YARN-1445.5.patch, YARN-1445.5.patch, YARN-1445.6.patch Today, we will transmit both RMAppState.FINISHING and RMAppState.FINISHED to YarnApplicationState.FINISHED. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1730: - Description: Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired. (was: The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write.) Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
[ https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919470#comment-13919470 ] Hudson commented on YARN-1765: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/]) YARN-1765. Added test cases to verify that killApplication API works across ResourceManager failover. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573735) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Write test cases to verify that killApplication API works in RM HA -- Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1765.1.patch, YARN-1765.2.patch, YARN-1765.2.patch, YARN-1765.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1675) Application does not change to RUNNING after being scheduled
[ https://issues.apache.org/jira/browse/YARN-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919471#comment-13919471 ] Hudson commented on YARN-1675: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/]) YARN-1675. Added the previously missed new file. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573736) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestKillApplicationWithRMHA.java Application does not change to RUNNING after being scheduled Key: YARN-1675 URL: https://issues.apache.org/jira/browse/YARN-1675 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Trupti Dhavle I dont see any stacktraces in logs. But the debug logs show negative vcores- {noformat} 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(808)) - assignContainers: node=hor11n39.gq1.ygridcore.net #applications=5 2014-01-29 18:42:26,357 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0269 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 currentConsumption=2048 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0269 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application application_1390986573180_0269 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0269 headRoom=memory:22528, vCores:0 currentConsumption=2048 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0269 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,358 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0272 2014-01-29 18:42:26,358 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0272 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application application_1390986573180_0272 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0272 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0272 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,359 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(827)) - pre-assignContainers for application application_1390986573180_0273 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(326)) - showRequests: application=application_1390986573180_0273 headRoom=memory:18432, vCores:-2 currentConsumption=2048 2014-01-29 18:42:26,359 DEBUG scheduler.SchedulerApplicationAttempt (SchedulerApplicationAttempt.java:showRequests(330)) - showRequests: application=application_1390986573180_0273 request={Priority: 0, Capability: memory:2048, vCores:1, # Containers: 0, Location: *, Relax Locality: true} 2014-01-29 18:42:26,360 DEBUG capacity.LeafQueue (LeafQueue.java:assignContainers(911)) - post-assignContainers for application
[jira] [Commented] (YARN-1748) hadoop-yarn-server-tests packages core-site.xml breaking downstream tests
[ https://issues.apache.org/jira/browse/YARN-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919479#comment-13919479 ] Hudson commented on YARN-1748: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/]) YARN-1748. Excluded core-site.xml from hadoop-yarn-server-tests package's jar and thus avoid breaking downstream tests. Contributed by Sravya Tirukkovalur. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573795) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/pom.xml hadoop-yarn-server-tests packages core-site.xml breaking downstream tests - Key: YARN-1748 URL: https://issues.apache.org/jira/browse/YARN-1748 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sravya Tirukkovalur Assignee: Sravya Tirukkovalur Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1748-1.patch, YARN-1748-1.patch Jars should not package config files, as this might come into the classpaths of clients causing the clients to break. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919475#comment-13919475 ] Hudson commented on YARN-1729: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1716 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1716/]) YARN-1729. Made TimelineWebServices deserialize the string primary- and secondary-filters param into the JSON-compatible object. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1573825) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestGenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java TimelineWebServices always passes primary and secondary filters as strings -- Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919506#comment-13919506 ] Thomas Graves commented on YARN-1769: - if canAllocContainer is false then you can't reserve another container. This could happen if you don't have any containers to unreserve when you hit the reservation limits and this node doesn't have available containers. if ((!scheduler.getConfiguration().getReservationContinueLook()) // without feature always reserve like previously did || (canAllocContainer) // if we hit our reservation limit and no available space on this node, don't reserve another one || (rmContainer != null)) { // if this was called because node already had reservation, we need to make sure it gets book keeped as re-reservation I can simplify this a bit. I don't really need the !scheduler.getConfiguration().getReservationContinueLook check anymore since canAllocContainer defaults to true in that case. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670-b23.patch aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670.patch Attaching the patch for trunk, branch2 and branch23. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919544#comment-13919544 ] Hadoop QA commented on YARN-1670: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632518/YARN-1670.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3241//console This message is automatically generated. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch update if check in assignContainer CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state
[ https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919548#comment-13919548 ] Rohith commented on YARN-1752: -- Previous Hadoop QA failure is not because of patch. What is the procedure to rerun the HadoopQA? Unexpected Unregistered event at Attempt Launched state --- Key: YARN-1752 URL: https://issues.apache.org/jira/browse/YARN-1752 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, YARN-1752.4.patch {code} 2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:695) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1670: Attachment: YARN-1670.patch updated patch for trunk and branch-2 aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919580#comment-13919580 ] Hadoop QA commented on YARN-1670: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632525/YARN-1670.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3243//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3243//console This message is automatically generated. aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919592#comment-13919592 ] Hadoop QA commented on YARN-1769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632523/YARN-1769.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3242//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3242//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3242//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919661#comment-13919661 ] Vinod Kumar Vavilapalli commented on YARN-1730: --- I wish there were more tests, but testing these write locks isn't easy. So I am fine for now. The latest patch looks good to me. +1. Checking this in. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919705#comment-13919705 ] Vinod Kumar Vavilapalli commented on YARN-986: -- I'm giving Jenkins a try again to be sure this issue still persists.. RM DT token service should have service addresses of both RMs - Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, yarn-986-prelim-0.patch Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1781) NM should allow users to specify max disk utilization for local disks
Varun Vasudev created YARN-1781: --- Summary: NM should allow users to specify max disk utilization for local disks Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919724#comment-13919724 ] Hadoop QA commented on YARN-1717: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632512/YARN-1717.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3244//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3244//console This message is automatically generated. Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, YARN-1717.8.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1761: Attachment: YARN-1766.2.patch RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1761: Attachment: (was: YARN-1766.2.patch) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1761: Attachment: YARN-1761.2.patch RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919728#comment-13919728 ] Xuan Gong commented on YARN-1761: - submit the same patch again RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919697#comment-13919697 ] Xuan Gong commented on YARN-1761: - bq. Remote-configuration-provider on RM is a server side property. We will not use it to specify client-side configuration. Given that, why do we need to use the config-provider on the client side? Yes. We do not need it in the client side. RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919684#comment-13919684 ] Hudson commented on YARN-1730: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5260 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5260/]) YARN-1730. Implemented simple write-locking in the LevelDB based timeline-store. Contributed by Billie Rinaldi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574145) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, YARN-1730.4.patch, YARN-1730.5.patch, YARN-1730.6.patch Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919685#comment-13919685 ] Vinod Kumar Vavilapalli commented on YARN-986: -- Thanks Karthik, the latest patch looks good. I wish there were more tests directly validating tokens across fail-over. In the interest of progress, I am fine for now with your manual testing, we can file a separate ticket for that. The test failures are unrelated, commented on HDFS-6040. But I am not sure there are any test or other issues with our patch itself. Let's see how HDFS-6040 goes. Or we can run the jenkins script with tests offline. RM DT token service should have service addresses of both RMs - Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, yarn-986-prelim-0.patch Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
[ https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1761: Attachment: YARN-1761.2.patch RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby Key: YARN-1761 URL: https://issues.apache.org/jira/browse/YARN-1761 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1761.1.patch, YARN-1761.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919732#comment-13919732 ] Zhijie Shen commented on YARN-1389: --- Thanks for the new patch. Here're some more comments. 1. I still see codeApplicationHistoryServer/code in ApplicationClientProtocol. And some description sounds not accurate. For example, {code} + * p + * The interface used by clients to get a report of all Application attempts + * in the cluster from the codeApplicationHistoryServer/code. + * /p {code} Please double check the javadoc 2. ApplicationHistoryProtocol's javadoc has been wrongly modified. 3. Is it better to simplify the following condition? Same for all the similar conditions in the patch {code} + if (!((e.getClass() == ApplicationNotFoundException.class) || (e + .getClass() == ApplicationAttemptNotFoundException.class))) { {code} to {code} + if (e.getClass() != ApplicationNotFoundException.class e + .getClass() != ApplicationAttemptNotFoundException.class) { {code} 4. Please match the NotFoundException that will be thrown in ClientRMService, and that is analyzed in YarnClientImpl. 5. It is still an in-progress patch, isn't it? The test cases are still missing. bq. 4. Users are not able to get completed application list via YarnClient bq. Done Didn't see the change to allow user to get the application list from the history bq. These are just utility functions, do you think they are needed in RMAPPATtempt and RMContainer? Please see what RMApp does ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs -- Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state
[ https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919736#comment-13919736 ] Jian He commented on YARN-1752: --- bq. What is the procedure to rerun the HadoopQA? you can submit the same patch again and comment that submitting the same patch to kick off jenkins. Patch looks good, but just that there's still a typo in the code comment: tries to register more than once, which is introduced by an earlier patch, can you fix that also? thanks! Unexpected Unregistered event at Attempt Launched state --- Key: YARN-1752 URL: https://issues.apache.org/jira/browse/YARN-1752 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, YARN-1752.4.patch {code} 2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:695) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919750#comment-13919750 ] Jason Lowe commented on YARN-1781: -- Note that we may need to do more than just mark disks as unusable once they are full for a specified definition of full. I suspect a disk being full is a more transient kind of failure than other failures, and it would be nice if full disks were added back in to the list of good dirs once they fall below a threshold. Not a hard requirement necessarily for this JIRA, but I can see it being an immediate followup request if not implemented. The recovery from full may be covered by YARN-90 when that's implemented. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-1781: --- Assignee: Varun Vasudev NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919766#comment-13919766 ] Varun Vasudev commented on YARN-1781: - My plan is to work YARN-90 once a patch for this gets checked in. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919765#comment-13919765 ] Vinod Kumar Vavilapalli commented on YARN-1766: --- Hm.. in that case, can we validate default values of all things - queue config, admin-acls, proxy-config etc. when starting RM with FSBCP? When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1766.1.patch, YARN-1766.2.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-90: - Assignee: Varun Vasudev NodeManager should identify failed disks becoming good back again - Key: YARN-90 URL: https://issues.apache.org/jira/browse/YARN-90 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Ravi Gummadi Assignee: Varun Vasudev Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch, YARN-90.patch MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs
[ https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1389: -- Summary: ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs (was: ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs - Key: YARN-1389 URL: https://issues.apache.org/jira/browse/YARN-1389 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1781: -- Issue Type: Sub-task (was: Bug) Parent: YARN-257 NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1781: Attachment: apache-yarn-1781.0.patch NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1414: -- Attachment: YARN-1221-v2.patch with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs - Key: YARN-1414 URL: https://issues.apache.org/jira/browse/YARN-1414 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.2.0 Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919867#comment-13919867 ] Hadoop QA commented on YARN-986: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632426/yarn-986-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3245//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3245//console This message is automatically generated. RM DT token service should have service addresses of both RMs - Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, yarn-986-prelim-0.patch Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919868#comment-13919868 ] Hadoop QA commented on YARN-1781: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632576/apache-yarn-1781.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3249//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3249//console This message is automatically generated. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1780) Improve logging in timeline service
[ https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1780: -- Attachment: YARN-1780.1.patch Create a patch to do the following things: 1. In TimelineClientImpl, make info level log of the timeline service address. 2. In TimelineClientImpl, capture the runtime of jersey client posting entities, and make error level log. The reason of doing this is that some important error, such as the connect refused exception, is wrapped in runtime exception by jersey client. 3. In TimlineWebServices, make info level log of the posted entities' ID to know whether the requests have been processed by the timeline service, and, make debug level log of the complete posted json content if necessary. By do this, it will be more easier to trace whether an entity has been successfully posted to the timeline service. Improve logging in timeline service --- Key: YARN-1780 URL: https://issues.apache.org/jira/browse/YARN-1780 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1780.1.patch The server side of timeline service is lacking logging information, which makes debugging difficult -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1780) Improve logging in timeline service
[ https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1780: -- Description: It's difficult to trace whether the client has successfully posted the entity to the timeline service or not. (was: The server side of timeline service is lacking logging information, which makes debugging difficult) Improve logging in timeline service --- Key: YARN-1780 URL: https://issues.apache.org/jira/browse/YARN-1780 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1780.1.patch It's difficult to trace whether the client has successfully posted the entity to the timeline service or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919893#comment-13919893 ] Vinod Kumar Vavilapalli commented on YARN-986: -- TestNMClient is unrelated. Time to check this in. RM DT token service should have service addresses of both RMs - Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, yarn-986-prelim-0.patch Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1525: --- Attachment: Yarn1525.secure.patch works in secure cluster. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919996#comment-13919996 ] Varun Vasudev commented on YARN-1781: - The attached patch adds support for checking disk utilization as part of the checkDirs function in DirectoryCollection. Disk utilization can be specified as a percentage via the yarn config, with the default value set to 1.0F(use the full disk). NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1781.0.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1782) CLI should let users to query cluster metrics
Zhijie Shen created YARN-1782: - Summary: CLI should let users to query cluster metrics Key: YARN-1782 URL: https://issues.apache.org/jira/browse/YARN-1782 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Like RM webUI and RESTful services, YARN CLI should also enable users to query the cluster metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-986) RM DT token service should have service addresses of both RMs
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920081#comment-13920081 ] Hudson commented on YARN-986: - SUCCESS: Integrated in Hadoop-trunk-Commit #5261 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5261/]) YARN-986. Changed client side to be able to figure out the right RM Delegation token for the right ResourceManager when HA is enabled. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574190) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenSelector.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/TestClientRMProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java RM DT token service should have service addresses of both RMs - Key: YARN-986 URL: https://issues.apache.org/jira/browse/YARN-986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Karthik Kambatla Priority: Blocker Fix For: 2.4.0 Attachments: yarn-986-1.patch, yarn-986-2.patch, yarn-986-3.patch, yarn-986-prelim-0.patch Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1766: Attachment: YARN-1766.3.patch When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920089#comment-13920089 ] Xuan Gong commented on YARN-1766: - modify the testcase to validate default values for queue config, admin-acls, proxy-config, exclude-nodelists, SuperGroupMapping and user-to-groups mapping. Also, manually call refreshSuperUserGroupsConfiguration when RM does the initiation. When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
Arpit Gupta created YARN-1783: - Summary: yarn application does not make any progress even when no other application is running when RM is being restarted in the background Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920105#comment-13920105 ] Arpit Gupta commented on YARN-1783: --- application that took the longest is application_1393347856479_0014 {code} 14/02/25 17:41:01 INFO mapreduce.Job: Job job_1393347856479_0017 running in uber mode : false 14/02/25 17:41:01 INFO mapreduce.Job: map 0% reduce 0% 2014-02-25 17:41:02,145|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-02-25 17:41:03,419|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):6 2014-02-25 17:41:03,419|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0013 Sleep job MAPREDUCEhrt_qa default ACCEPTED UNDEFINED 0% N/A 2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0014 test_mapred_ha_pending_job_rm_1393349992-1 MAPREDUCE hrt_qa defaultACCEPTED UNDEFINED 0%null 2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0018 test_mapred_ha_pending_job_rm_1393349992-3 MAPREDUCE hrt_qa default RUNNING UNDEFINED 5% http://hor12n10.gq1.ygridcore.net:41840 2014-02-25 17:41:03,420|beaver.machine|INFO|application_1393347856479_0017 test_mapred_ha_pending_job_rm_1393349992-2 MAPREDUCE hrt_qa default RUNNING UNDEFINED 5% http://hor12n10.gq1.ygridcore.net:51732 2014-02-25 17:41:03,421|beaver.machine|INFO|application_1393347856479_0016 test_mapred_ha_pending_job_rm_1393349992-4 MAPREDUCE hrt_qa default RUNNING UNDEFINED 5% http://hor12n08.gq1.ygridcore.net:50966 2014-02-25 17:41:03,421|beaver.machine|INFO|application_1393347856479_0015 test_mapred_ha_pending_job_rm_1393349992-0 MAPREDUCE hrt_qa default RUNNING UNDEFINED 35.01% http://hor12n08.gq1.ygridcore.net:54998 {code} and this is when it completed {code} 2014-02-25 20:52:32,992|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-02-25 20:52:32,993|beaver.machine|INFO|Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL 2014-02-25 20:52:32,993|beaver.machine|INFO|application_1393347856479_0014 test_mapred_ha_pending_job_rm_1393349992-1 MAPREDUCE hrt_qa default RUNNING UNDEFINED 86.01% http://hor12n08.gq1.ygridcore.net:46622 14/02/25 20:52:35 INFO mapreduce.Job: map 100% reduce 100% 14/02/25 20:52:37 INFO mapreduce.Job: Job job_1393347856479_0014 completed successfully 14/02/25 20:52:37 INFO mapreduce.Job: Counters: 49 File System Counters {code} yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1064) YarnConfiguration scheduler configuration constants are not consistent
[ https://issues.apache.org/jira/browse/YARN-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1064: --- Target Version/s: 2.4.0 Fix Version/s: (was: 2.4.0) Set target version 2.4.0. [~tucu00] - now that we have already shipped 2.2.0 GA and 2.3, do you think we should continue to call this a blocker? YarnConfiguration scheduler configuration constants are not consistent -- Key: YARN-1064 URL: https://issues.apache.org/jira/browse/YARN-1064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Priority: Blocker Labels: newbie Some of the scheduler configuration constants in YarnConfiguration have RM_PREFIX and others YARN_PREFIX. For consistency we should move all under the same prefix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1784) TestContainerAllocation assumes CapacityScheduler
Karthik Kambatla created YARN-1784: -- Summary: TestContainerAllocation assumes CapacityScheduler Key: YARN-1784 URL: https://issues.apache.org/jira/browse/YARN-1784 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor TestContainerAllocation assumes CapacityScheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1783: - Target Version/s: 2.4.0 yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated YARN-1783: -- Description: Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. was: Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920133#comment-13920133 ] Hadoop QA commented on YARN-1525: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632660/Yarn1525.secure.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3251//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3251//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3251//console This message is automatically generated. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920137#comment-13920137 ] Mit Desai commented on YARN-1670: - It is a code change and there are no unit tests for this change aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Attachments: YARN-1670-b23.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920159#comment-13920159 ] Hadoop QA commented on YARN-1766: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632675/YARN-1766.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3252//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3252//console This message is automatically generated. When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920195#comment-13920195 ] Vinod Kumar Vavilapalli commented on YARN-1766: --- +1, looks good. It's much better now, given we also caught a bug! Checking this in. When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920203#comment-13920203 ] Cindy Li commented on YARN-1525: The findbug warning is easy to fix. Other than that, Karthik, Vinod, Xuan, can any of you give the patch a final review? Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920208#comment-13920208 ] Xuan Gong commented on YARN-1525: - Looking at the patch now Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1525: --- Attachment: Yarn1525.secure.patch Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1766) When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.
[ https://issues.apache.org/jira/browse/YARN-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920246#comment-13920246 ] Hudson commented on YARN-1766: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5262/]) YARN-1766. Fixed a bug in ResourceManager to use configuration loaded from the configuration-provider when booting up. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574252) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration. --- Key: YARN-1766 URL: https://issues.apache.org/jira/browse/YARN-1766 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1766.1.patch, YARN-1766.2.patch, YARN-1766.3.patch Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920287#comment-13920287 ] Hadoop QA commented on YARN-1525: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632705/Yarn1525.secure.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3253//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3253//console This message is automatically generated. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1785) FairScheduler treats app lookup failures as ERRORs
bc Wong created YARN-1785: - Summary: FairScheduler treats app lookup failures as ERRORs Key: YARN-1785 URL: https://issues.apache.org/jira/browse/YARN-1785 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: bc Wong When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to RMAppImpl#createAndGetApplicationReport, which calls RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in the scheduler, which may or may not exist. So FairScheduler shouldn't log an error for every lookup failure: {noformat} 2014-02-17 08:23:21,240 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_1392419715319_0135_01 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920355#comment-13920355 ] Jian He commented on YARN-1783: --- The problem is that while NM is resyncing with RM, NM will clean the finished containers from its context before it processes the resync command. But the RM is still waiting for previous AM container Finished event from NM after it restarts, so that it knows to launch a new attempt. yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1783: -- Attachment: YARN-1783.1.patch The patch splits the getContainerStatus and removeCompletedContainers logic in the original method getNodeStatusAndUpdateContainersInContext, and put the removeCompletedContainers after the nodeStatusUpdater gets the resync command. Rewrote TestNodeStatusUpdater.testCompletedContainerStatusBackup() as that test case was broken. yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Attachments: YARN-1783.1.patch Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920377#comment-13920377 ] Jian He commented on YARN-1783: --- The new added test passed with the core code change and failed without the core change. yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Attachments: YARN-1783.1.patch Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920392#comment-13920392 ] Hadoop QA commented on YARN-1783: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632737/YARN-1783.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3254//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3254//console This message is automatically generated. yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Attachments: YARN-1783.1.patch Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920395#comment-13920395 ] Karthik Kambatla commented on YARN-1525: Verified - the redirection works as expected on a secure cluster. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background
[ https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1783: -- Attachment: YARN-1783.2.patch new patch with minor fix yarn application does not make any progress even when no other application is running when RM is being restarted in the background -- Key: YARN-1783 URL: https://issues.apache.org/jira/browse/YARN-1783 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Critical Attachments: YARN-1783.1.patch, YARN-1783.2.patch Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920402#comment-13920402 ] Karthik Kambatla commented on YARN-1525: Comments on the latest patch (it would be easier to refer to patches if they are versioned): # RMHAUtils#findActiveRMId: looks like we return before resetting the id to initial RM-id? It might be safer to just create a copy of the conf - new YarnConfiguration(yarnConf), so we don't have to store the initial value and reset at the end. {code} if (haState.equals(HAServiceState.ACTIVE)) { yarnConf.set(YarnConfiguration.RM_HA_ID, rmId); return currentId; } } catch (Exception e) { } } yarnConf.set(YarnConfiguration.RM_HA_ID, rmId); {code} # RMDispatcher: remove TODO {code} RMDispatcher(WebApp webApp, Injector injector, Router router) { super(webApp, injector, router); // TODO Auto-generated constructor stub } {code} # Don't see much use for RMWebApp#standbyMode. Why not just use isStandbyMode(). # Nit: Spurious commit. Get rid of it? {code} - void setRedirectPath(String path) { this.redirectPath = path; } + protected void setRedirectPath(String path) { +this.redirectPath = path; + } {code} Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, Yarn1525.secure.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1786) TestRMAppTransitions occasionally fail
shenhong created YARN-1786: -- Summary: TestRMAppTransitions occasionally fail Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1786: --- Description: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} was: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at
[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1786: --- Description: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1786: --- Description: TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} was: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at
[jira] [Commented] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920410#comment-13920410 ] shenhong commented on YARN-1786: Here is the code: {code} private void sendAppUpdateSavedEvent(RMApp application) { RMAppEvent event = new RMAppUpdateSavedEvent(application.getApplicationId(), null); application.handle(event); rmDispatcher.await(); } private void sendAttemptUpdateSavedEvent(RMApp application) { application.getCurrentAppAttempt().handle( new RMAppAttemptUpdateSavedEvent(application.getCurrentAppAttempt() .getAppAttemptId(), null)); } {code} At sendAttemptUpdateSavedEvent(), there is no rmDispatcher.await() after handle an event, it should be changed to {code} private void sendAttemptUpdateSavedEvent(RMApp application) { application.getCurrentAppAttempt().handle( new RMAppAttemptUpdateSavedEvent(application.getCurrentAppAttempt() .getAppAttemptId(), null)); rmDispatcher.await(); } {code} TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong reassigned YARN-1786: -- Assignee: shenhong TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong Assignee: shenhong Attachments: YARN-1786.patch TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1786: --- Attachment: YARN-1786.patch Add a patch to fix the bug! TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong Attachments: YARN-1786.patch TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1786) TestRMAppTransitions occasionally fail
[ https://issues.apache.org/jira/browse/YARN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated YARN-1786: --- Description: TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} was: TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertAppAndAttemptKilled(TestRMAppTransitions.java:310) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedKill(TestRMAppTransitions.java:624) testAppRunningKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.033 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) testAppRunningKill[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.036 sec FAILURE! junit.framework.AssertionFailedError: application finish time is not greater then 0 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertTimesAtFinish(TestRMAppTransitions.java:283) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.assertKilled(TestRMAppTransitions.java:298) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppRunningKill(TestRMAppTransitions.java:646) {code} TestRMAppTransitions occasionally fail -- Key: YARN-1786 URL: https://issues.apache.org/jira/browse/YARN-1786 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: shenhong Assignee: shenhong Attachments: YARN-1786.patch TestRMAppTransitions often fail with application finish time is not greater then 0, following is log: {code} testAppAcceptedKill[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.04 sec FAILURE! junit.framework.AssertionFailedError: