[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState
[ https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900227#comment-13900227 ] Hudson commented on YARN-1345: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/480/]) YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. Contributed by Zhijie Shen (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Removing FINAL_SAVING from YarnApplicationAttemptState -- Key: YARN-1345 URL: https://issues.apache.org/jira/browse/YARN-1345 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1345.1.patch, YARN-1345.2.patch Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1578) Fix how to read history file in FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900231#comment-13900231 ] Hudson commented on YARN-1578: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/480/]) YARN-1578. Fixed reading incomplete application attempt and container data in FileSystemApplicationHistoryStore. Contributed by Shinichi Yamashita. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567816) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestFileSystemApplicationHistoryStore.java Fix how to read history file in FileSystemApplicationHistoryStore - Key: YARN-1578 URL: https://issues.apache.org/jira/browse/YARN-1578 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-321 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Fix For: 2.4.0 Attachments: YARN-1578-2.patch, YARN-1578-3.patch, YARN-1578-4.patch, YARN-1578.patch, application_1390978867235_0001, resoucemanager.log, screenshot.png, screenshot2.pdf I carried out PiEstimator job at Hadoop cluster which applied YARN-321. After the job end and when I accessed Web UI of HistoryServer, it displayed 500. And HistoryServer daemon log was output as follows. {code} 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory/appattempt/appattempt_1389146249925_0008_01 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) (snip...) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110) (snip...) {code} I confirmed that there was container which was not finished from ApplicationHistory file. In ResourceManager daemon log, ResourceManager reserved this container, but did not allocate it. When FileSystemApplicationHistoryStore reads container information without finish data in history file, this problem occurs. In consideration of the case which there is not finish data, we should fix how to read history file in FileSystemApplicationHistoryStore. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1692) ConcurrentModificationException in fair scheduler AppSchedulable
[ https://issues.apache.org/jira/browse/YARN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900230#comment-13900230 ] Hudson commented on YARN-1692: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/480/]) Move YARN-1692 in CHANGES.txt (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567793) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt YARN-1692. ConcurrentModificationException in fair scheduler AppSchedulable (Sangjin Lee via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567788) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java ConcurrentModificationException in fair scheduler AppSchedulable Key: YARN-1692 URL: https://issues.apache.org/jira/browse/YARN-1692 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Sangjin Lee Assignee: Sangjin Lee Fix For: 2.4.0 Attachments: yarn-1692-branch-2.3.patch, yarn-1692.patch We saw a ConcurrentModificationException thrown in the fair scheduler: {noformat} 2014-02-07 01:40:01,978 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Exception in fair scheduler UpdateThread java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$ValueIterator.next(HashMap.java:954) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195) at java.lang.Thread.run(Thread.java:724) {noformat} The map that gets returned by FSSchedulerApp.getResourceRequests() are iterated on without proper synchronization. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active
[ https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900234#comment-13900234 ] Hudson commented on YARN-1641: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/480/]) YARN-1641. ZK store should attempt a write periodically to ensure it is still Active. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java ZK store should attempt a write periodically to ensure it is still Active - Key: YARN-1641 URL: https://issues.apache.org/jira/browse/YARN-1641 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1641-1.patch, yarn-1641-2.patch Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active. By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1531) True up yarn command documentation
[ https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900226#comment-13900226 ] Hudson commented on YARN-1531: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/480/]) YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm True up yarn command documentation -- Key: YARN-1531 URL: https://issues.apache.org/jira/browse/YARN-1531 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: documentaion Fix For: 2.4.0 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch There are some options which are not written to Yarn Command document. For example, yarn rmadmin command options are as follows: {code} Usage: yarn rmadmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} But some of the new options such as -getGroups, -transitionToActive, and -transitionToStandby are not documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState
[ https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900321#comment-13900321 ] Hudson commented on YARN-1345: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/]) YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. Contributed by Zhijie Shen (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Removing FINAL_SAVING from YarnApplicationAttemptState -- Key: YARN-1345 URL: https://issues.apache.org/jira/browse/YARN-1345 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1345.1.patch, YARN-1345.2.patch Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active
[ https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900328#comment-13900328 ] Hudson commented on YARN-1641: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/]) YARN-1641. ZK store should attempt a write periodically to ensure it is still Active. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java ZK store should attempt a write periodically to ensure it is still Active - Key: YARN-1641 URL: https://issues.apache.org/jira/browse/YARN-1641 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1641-1.patch, yarn-1641-2.patch Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active. By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1531) True up yarn command documentation
[ https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900320#comment-13900320 ] Hudson commented on YARN-1531: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/]) YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm True up yarn command documentation -- Key: YARN-1531 URL: https://issues.apache.org/jira/browse/YARN-1531 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: documentaion Fix For: 2.4.0 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch There are some options which are not written to Yarn Command document. For example, yarn rmadmin command options are as follows: {code} Usage: yarn rmadmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} But some of the new options such as -getGroups, -transitionToActive, and -transitionToStandby are not documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1677) Potential bugs in exception handlers
[ https://issues.apache.org/jira/browse/YARN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900332#comment-13900332 ] Ding Yuan commented on YARN-1677: - Hi, is there anything else I could help for these cases? Potential bugs in exception handlers Key: YARN-1677 URL: https://issues.apache.org/jira/browse/YARN-1677 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Ding Yuan Attachments: yarn-1677.patch Hi Yarn developers, We are a group of researchers on software reliability, and recently we did a study and found that majority of the most severe failures in hadoop are caused by bugs in exception handling logic. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe failures. I am reporting some of the results for Yarn here. Any feedback is much appreciated! == Case 1: Line: 551, File: org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java {noformat} switch (monitoringEvent.getType()) { case START_MONITORING_CONTAINER: .. .. default: // TODO: Wrong event. } {noformat} The switch fall-through (handling any potential unexpected event) is empty. Should we at least print an error message here? == == Case 2: Line: 491, File: org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java {noformat} } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } {noformat} The handler of this very general exception only logs the error. The TODO seems to indicate it is not sufficient. == == Case 3: Line: 861, File: org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java for (LocalResourceStatus stat : remoteResourceStatuses) { LocalResource rsrc = stat.getResource(); LocalResourceRequest req = null; try { req = new LocalResourceRequest(rsrc); } catch (URISyntaxException e) { // TODO fail? Already translated several times... } The handler for URISyntaxException is empty, and the TODO seems to indicate it is not sufficient. The same code pattern can also be found at: Line: 901, File: org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java Line: 838, File: org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java Line: 878, File: org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java At line: 803, File: org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java, the handler of URISyntaxException also seems not sufficient: {noformat} try { shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); } catch (URISyntaxException e) { LOG.error(Error when trying to use shell script path specified + in env, path= + shellScriptPath); e.printStackTrace(); // A failure scenario on bad input such as invalid shell script path // We know we cannot continue launching the container // so we should release it. // TODO numCompletedContainers.incrementAndGet(); numFailedContainers.incrementAndGet(); return; } {noformat} == == Case 4: Line: 627, File: org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java {noformat} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitonException e) { LOG.error(Can't handle this event at current state, e); /* TODO fail the application on the failed transition */ } {noformat} The handler of this exception only logs the error. The TODO seems to indicate it is not sufficient. This exact same code pattern can also be found at: Line: 573, File: org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java == == Case 5: empty handler for exception: java.lang.InterruptedException Line: 123, File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java {noformat} public void join() { if(proxyServer != null) {
[jira] [Commented] (YARN-1578) Fix how to read history file in FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900396#comment-13900396 ] Hudson commented on YARN-1578: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/]) YARN-1578. Fixed reading incomplete application attempt and container data in FileSystemApplicationHistoryStore. Contributed by Shinichi Yamashita. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567816) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestFileSystemApplicationHistoryStore.java Fix how to read history file in FileSystemApplicationHistoryStore - Key: YARN-1578 URL: https://issues.apache.org/jira/browse/YARN-1578 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-321 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Fix For: 2.4.0 Attachments: YARN-1578-2.patch, YARN-1578-3.patch, YARN-1578-4.patch, YARN-1578.patch, application_1390978867235_0001, resoucemanager.log, screenshot.png, screenshot2.pdf I carried out PiEstimator job at Hadoop cluster which applied YARN-321. After the job end and when I accessed Web UI of HistoryServer, it displayed 500. And HistoryServer daemon log was output as follows. {code} 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory/appattempt/appattempt_1389146249925_0008_01 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) (snip...) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110) (snip...) {code} I confirmed that there was container which was not finished from ApplicationHistory file. In ResourceManager daemon log, ResourceManager reserved this container, but did not allocate it. When FileSystemApplicationHistoryStore reads container information without finish data in history file, this problem occurs. In consideration of the case which there is not finish data, we should fix how to read history file in FileSystemApplicationHistoryStore. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1641) ZK store should attempt a write periodically to ensure it is still Active
[ https://issues.apache.org/jira/browse/YARN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900399#comment-13900399 ] Hudson commented on YARN-1641: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/]) YARN-1641. ZK store should attempt a write periodically to ensure it is still Active. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567628) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java ZK store should attempt a write periodically to ensure it is still Active - Key: YARN-1641 URL: https://issues.apache.org/jira/browse/YARN-1641 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Fix For: 2.4.0 Attachments: yarn-1641-1.patch, yarn-1641-2.patch Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active. By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1692) ConcurrentModificationException in fair scheduler AppSchedulable
[ https://issues.apache.org/jira/browse/YARN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900395#comment-13900395 ] Hudson commented on YARN-1692: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/]) Move YARN-1692 in CHANGES.txt (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567793) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt YARN-1692. ConcurrentModificationException in fair scheduler AppSchedulable (Sangjin Lee via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567788) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java ConcurrentModificationException in fair scheduler AppSchedulable Key: YARN-1692 URL: https://issues.apache.org/jira/browse/YARN-1692 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Sangjin Lee Assignee: Sangjin Lee Fix For: 2.4.0 Attachments: yarn-1692-branch-2.3.patch, yarn-1692.patch We saw a ConcurrentModificationException thrown in the fair scheduler: {noformat} 2014-02-07 01:40:01,978 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Exception in fair scheduler UpdateThread java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$ValueIterator.next(HashMap.java:954) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195) at java.lang.Thread.run(Thread.java:724) {noformat} The map that gets returned by FSSchedulerApp.getResourceRequests() are iterated on without proper synchronization. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState
[ https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900392#comment-13900392 ] Hudson commented on YARN-1345: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/]) YARN-1345. Remove FINAL_SAVING state from YarnApplicationAttemptState. Contributed by Zhijie Shen (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567820) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Removing FINAL_SAVING from YarnApplicationAttemptState -- Key: YARN-1345 URL: https://issues.apache.org/jira/browse/YARN-1345 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1345.1.patch, YARN-1345.2.patch Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1531) True up yarn command documentation
[ https://issues.apache.org/jira/browse/YARN-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900391#comment-13900391 ] Hudson commented on YARN-1531: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/]) YARN-1531. True up yarn command documentation (Akira Ajisaka via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567775) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm True up yarn command documentation -- Key: YARN-1531 URL: https://issues.apache.org/jira/browse/YARN-1531 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: documentaion Fix For: 2.4.0 Attachments: YARN-1531.2.patch, YARN-1531.3.patch, YARN-1531.patch There are some options which are not written to Yarn Command document. For example, yarn rmadmin command options are as follows: {code} Usage: yarn rmadmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive serviceId -transitionToStandby serviceId -failover [--forcefence] [--forceactive] serviceId serviceId -getServiceState serviceId -checkHealth serviceId {code} But some of the new options such as -getGroups, -transitionToActive, and -transitionToStandby are not documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1717) Misc improvements to leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900542#comment-13900542 ] Hadoop QA commented on YARN-1717: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628790/YARN-1717.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3096//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3096//console This message is automatically generated. Misc improvements to leveldb timeline store --- Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * braces for all control flow statements * simple locking to prevent issues related to concurrent writes * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation
[ https://issues.apache.org/jira/browse/YARN-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900551#comment-13900551 ] Bikas Saha commented on YARN-1725: -- The only way to do it on AMRMClient would be to know which user request this container would match and then submit that user request again to the RM. There is no general way to do that correctly. I guess the problem is similar on the RM side since once its decremented the *, rack and node counters for requests, it can undo the * counter but does not know what to undo on the rack and node counters. RM should provide an easier way for the app to reject a bad allocation -- Key: YARN-1725 URL: https://issues.apache.org/jira/browse/YARN-1725 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Currently, if the app gets a bad allocation then it can release the container. However, the app now needs to request those resources again or else the RM will not give it a new container in lieu of the one just rejected. This makes the app writers life hard. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-304. -- Resolution: Duplicate Review the requirement again. It seems that the requirement has already been met by AHS. The finished applications will be served by AHS and the tracking links are persisted if they exist. Please feel free to reopen it if you think something is still missing. RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900672#comment-13900672 ] Cindy Li commented on YARN-1525: org.apache.hadoop.yarn.client.api.impl.TestNMClient is irrelevant. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900725#comment-13900725 ] Vinod Kumar Vavilapalli commented on YARN-1553: --- Seems like you missed this: - MiniMRYarnCluster: Shouldn’t use WebAppUtils for logging the JobHistoryserver’s address Looks good otherwise. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch, YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, YARN-1553.008.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1553: - Attachment: YARN-1553.009.patch Address Vinod's comments. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch, YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, YARN-1553.008.patch, YARN-1553.009.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1727) provide (async) application lifecycle events to management tools
[ https://issues.apache.org/jira/browse/YARN-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900741#comment-13900741 ] Zhijie Shen commented on YARN-1727: --- Could we do it via Timeline service? provide (async) application lifecycle events to management tools Key: YARN-1727 URL: https://issues.apache.org/jira/browse/YARN-1727 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Steve Loughran Management tools need to monitor long-lived applications. While the {{AM-Management System}} protocol is a matter for them, the management tooling will need to know about async events happening in YARN # application submitted # AM started # AM failed # AM restarted # AM finished This could be done by pushing events somewhere, or supporting a pollable history mechanism -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900742#comment-13900742 ] Vinod Kumar Vavilapalli commented on YARN-1676: --- +1, looks good. Checking this in. Make admin refreshUserToGroupsMappings of configuration work across RM failover --- Key: YARN-1676 URL: https://issues.apache.org/jira/browse/YARN-1676 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1667.3.patch, YARN-1676.1.patch, YARN-1676.2.patch, YARN-1676.3.patch, YARN-1676.4.patch, YARN-1676.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1728) History server doesn't understand percent encoded paths
Abraham Elmahrek created YARN-1728: -- Summary: History server doesn't understand percent encoded paths Key: YARN-1728 URL: https://issues.apache.org/jira/browse/YARN-1728 Project: Hadoop YARN Issue Type: Bug Reporter: Abraham Elmahrek For example, going to the job history server page http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr results in the following error: {code} Cannot get container logs. Invalid nodeId: test-cdh5-hue.ent.cloudera.com%3A8041 {code} Where the url decoded version works: http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr It seems like both should be supported as the former is simply percent encoding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.
[ https://issues.apache.org/jira/browse/YARN-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900774#comment-13900774 ] Vinod Kumar Vavilapalli commented on YARN-1417: --- Looks good. +1. Checking this in. RM may issue expired container tokens to AM while issuing new containers. - Key: YARN-1417 URL: https://issues.apache.org/jira/browse/YARN-1417 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Priority: Blocker Attachments: YARN-1417.2.patch, YARN-1417.3.patch Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1676) Make admin refreshUserToGroupsMappings of configuration work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900778#comment-13900778 ] Hudson commented on YARN-1676: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5165 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5165/]) YARN-1676. Modified RM HA handling of user-to-group mappings to be available across RM failover by making using of a remote configuration-provider. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1568041) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java Make admin refreshUserToGroupsMappings of configuration work across RM failover --- Key: YARN-1676 URL: https://issues.apache.org/jira/browse/YARN-1676 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1667.3.patch, YARN-1676.1.patch, YARN-1676.2.patch, YARN-1676.3.patch, YARN-1676.4.patch, YARN-1676.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-304: -- RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900806#comment-13900806 ] Vinod Kumar Vavilapalli commented on YARN-304: -- Can we keep this open to track the removal of the plugin added at YARN-285 ? RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1717) Misc improvements to leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1717: - Attachment: YARN-1717.5.patch Added fix for passing primary and secondary filters query params from web services to store. Misc improvements to leveldb timeline store --- Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * braces for all control flow statements * simple locking to prevent issues related to concurrent writes * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900869#comment-13900869 ] Zhijie Shen commented on YARN-304: -- Sure, have an offline discussion with Vinod. Let me summarize the expected behavior of tracking urls: 1. If users doesn't specify the tracking urls, let's keep it null. 2. If the application is cached in RM, it is accessible via RM web UI. The tracking url on RM web UI or being returned via report should be: if not null - user specified tracking url else if AHS is enabled - the url to the application page on AHS else - the url pointing to itself (RM) 3. If AHS is enabled and the application is recorded, it is accessible via AHS web UI. The tracking url on AHS web UI or being returned via report should be if not null - user specified tracking url else - the url pointing to itself (AHS) RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900870#comment-13900870 ] Zhijie Shen commented on YARN-304: -- After the aforementioned thing is done, the plugin should be safe to remove RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900871#comment-13900871 ] Jian He commented on YARN-713: -- I'd like to take this over. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-713: Assignee: Jian He (was: Omkar Vinit Joshi) ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jian He Priority: Critical Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900883#comment-13900883 ] Jason Lowe commented on YARN-304: - What about the case where the application is not cached in the RM and the AHS is not enabled? That's the case being handled by the plugin today, and isn't covered by the scenarios above. RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
[ https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900900#comment-13900900 ] Vinod Kumar Vavilapalli commented on YARN-1590: --- Looks good, +1. Running it through Jenkins one more time to be sure. _HOST doesn't expand properly for RM, NM, ProxyServer and JHS - Key: YARN-1590 URL: https://issues.apache.org/jira/browse/YARN-1590 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, YARN-1590.4.patch _HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1417) RM may issue expired container tokens to AM while issuing new containers.
[ https://issues.apache.org/jira/browse/YARN-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900904#comment-13900904 ] Hudson commented on YARN-1417: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5166 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5166/]) YARN-1417. Modified RM to generate container-tokens not at creation time, but at allocation time so as to prevent RM from shelling out containers with expired tokens. Contributed by Omkar Vinit Joshi and Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1568060) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java RM may issue expired container tokens to AM while issuing new containers. - Key: YARN-1417 URL: https://issues.apache.org/jira/browse/YARN-1417 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Jian He Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1417.2.patch, YARN-1417.3.patch Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900916#comment-13900916 ] Zhijie Shen commented on YARN-304: -- Hm..., thanks for pointing this out, Json! Without assuming AHS is always available with RM, the plugin seems to be still necessary, because unavailability of AHS can be considered as the original scenario when AHS feature is not there. RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Zhijie Shen This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900936#comment-13900936 ] Hadoop QA commented on YARN-1553: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628850/YARN-1553.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3097//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3097//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-web-proxy.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3097//console This message is automatically generated. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch, YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, YARN-1553.008.patch, YARN-1553.009.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState
[ https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1345: -- Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Removing FINAL_SAVING from YarnApplicationAttemptState -- Key: YARN-1345 URL: https://issues.apache.org/jira/browse/YARN-1345 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1345.1.patch, YARN-1345.2.patch Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1345) Removing FINAL_SAVING from YarnApplicationAttemptState
[ https://issues.apache.org/jira/browse/YARN-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1345: -- Affects Version/s: 2.4.0 Removing FINAL_SAVING from YarnApplicationAttemptState -- Key: YARN-1345 URL: https://issues.apache.org/jira/browse/YARN-1345 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1345.1.patch, YARN-1345.2.patch Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING - YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings
Billie Rinaldi created YARN-1729: Summary: ATSWebServices always passes primary and secondary filters as strings Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1730) Leveldb timeline store needs simple write locking
Billie Rinaldi created YARN-1730: Summary: Leveldb timeline store needs simple write locking Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1730: - Attachment: YARN-1730.1.patch Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1731) ResourceManager should record killed ApplicationMasters for History
[ https://issues.apache.org/jira/browse/YARN-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-1731: Attachment: YARN-1731.patch I’ve attached a preliminary version of the patch. Once we all agree on the specifics of the design, I can add unit tests. This basically writes out an empty file with the app attempt id and user as the filename to a directory in HDFS (the JHS or something else such as the AHS could then see it). We can easily replace HDFS with some other FileSystem or mechanism or make it pluggable in some fashion. ResourceManager should record killed ApplicationMasters for History --- Key: YARN-1731 URL: https://issues.apache.org/jira/browse/YARN-1731 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1731.patch Yarn changes required for MAPREDUCE-5641 to make the RM record when an AM is killed so the JHS (or something else) can know about it). See MAPREDUCE-5641 for the design I'm trying to follow. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1731) ResourceManager should record killed ApplicationMasters for History
Robert Kanter created YARN-1731: --- Summary: ResourceManager should record killed ApplicationMasters for History Key: YARN-1731 URL: https://issues.apache.org/jira/browse/YARN-1731 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-1731.patch Yarn changes required for MAPREDUCE-5641 to make the RM record when an AM is killed so the JHS (or something else) can know about it). See MAPREDUCE-5641 for the design I'm trying to follow. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1717) Misc improvements to leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901060#comment-13901060 ] Hadoop QA commented on YARN-1717: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628880/YARN-1717.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3098//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3098//console This message is automatically generated. Misc improvements to leveldb timeline store --- Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * braces for all control flow statements * simple locking to prevent issues related to concurrent writes * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1553: - Attachment: YARN-1553.010.patch Fix the findbugs warning. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch, YARN-1553.002.patch, YARN-1553.003.patch, YARN-1553.004.patch, YARN-1553.005.patch, YARN-1553.006.patch, YARN-1553.007.patch, YARN-1553.008.patch, YARN-1553.009.patch, YARN-1553.010.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1515: Attachment: YARN-1515.v03.patch v03 patch with combined RPC. Hopefully easier to review :) {code} $ wc YARN-1515.v0* 16645725 75045 YARN-1515.v02.patch 9823401 45101 YARN-1515.v03.patch {code} Ability to dump the container threads and stop the containers in a single RPC - Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: New Feature Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1732) Change types of related entities and primary filters in ATSEntity
Billie Rinaldi created YARN-1732: Summary: Change types of related entities and primary filters in ATSEntity Key: YARN-1732 URL: https://issues.apache.org/jira/browse/YARN-1732 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi The current types MapString, ListString relatedEntities and MapString, Object primaryFilters have issues. The ListString value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to MapString, SetString and primary filters to MapString, SetObject. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901081#comment-13901081 ] Junping Du commented on YARN-1506: -- Agree with Arun that this is not a blocker. Hi [~bikassaha], Thanks for you review and comments. Sorry for replying late as just come back from a long vacation. Please see my reply below: bq. ADMIN_RESOURCE_UPDATE instead of RESOURCE_UPDATE for the enum would help clarify that its a forced admin update. Ok. Will update it. bq. Why not update the total capability here also (like we do for non-running node). When the node reports back as healthy then we would probably need the new resource value, right? For node that unusable (unhealthy, LOST or decommissioned), I think it may be simpler to just log and warn rather than do any valid change. Or user may get confused that the node is still usable. Thoughts? bq. Why are we doing this indirect subtraction via delta instead of simply clusterResource-=old; clusterResource+=new. Its the same number of operations and less confusing to read. Good point. Will update it. bq. I think its crucial to have a more complete test (maybe using mockRM) that verifies the flow from admin service to the scheduler. Most interesting would be the case when the node is full allocated and then an update reduces the capacity. Thus resulting in -ve value of available resource on the node. I am wary that this case may have bugs in handling the -ve value in existing scheduler code because its unexpected. Its fine for the test to use the default scheduler. Agree. Although I am pretty sure it works fine so far from my offline integration test, we have to add unit test to cover resource over-commitment case so any changes in future won't break these assumptions. Will update patch soon. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1732) Change types of related entities and primary filters in ATSEntity
[ https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1732: - Attachment: YARN-1732.1.patch Change types of related entities and primary filters in ATSEntity - Key: YARN-1732 URL: https://issues.apache.org/jira/browse/YARN-1732 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1732.1.patch The current types MapString, ListString relatedEntities and MapString, Object primaryFilters have issues. The ListString value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to MapString, SetString and primary filters to MapString, SetObject. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1729: - Attachment: YARN-1729.1.patch ATSWebServices always passes primary and secondary filters as strings - Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1729.1.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1717: - Summary: Enable offline deletion of entries in leveldb timeline store (was: Misc improvements to leveldb timeline store) Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * braces for all control flow statements * simple locking to prevent issues related to concurrent writes * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1717: - Description: The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities was: The leveldb timeline store implementation needs the following: * better documentation of its internal structures * braces for all control flow statements * simple locking to prevent issues related to concurrent writes * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store
[ https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1717: - Attachment: YARN-1717.6.patch Enable offline deletion of entries in leveldb timeline store Key: YARN-1717 URL: https://issues.apache.org/jira/browse/YARN-1717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1717.1.patch, YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, YARN-1717.6.patch The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1590) _HOST doesn't expand properly for RM, NM, ProxyServer and JHS
[ https://issues.apache.org/jira/browse/YARN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901093#comment-13901093 ] Hadoop QA commented on YARN-1590: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626945/YARN-1590.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3099//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3099//console This message is automatically generated. _HOST doesn't expand properly for RM, NM, ProxyServer and JHS - Key: YARN-1590 URL: https://issues.apache.org/jira/browse/YARN-1590 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: YARN-1590.1.patch, YARN-1590.2.patch, YARN-1590.3.patch, YARN-1590.4.patch _HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1732) Change types of related entities and primary filters in ATSEntity
[ https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901119#comment-13901119 ] Hadoop QA commented on YARN-1732: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628944/YARN-1732.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3102//console This message is automatically generated. Change types of related entities and primary filters in ATSEntity - Key: YARN-1732 URL: https://issues.apache.org/jira/browse/YARN-1732 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1732.1.patch The current types MapString, ListString relatedEntities and MapString, Object primaryFilters have issues. The ListString value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to MapString, SetString and primary filters to MapString, SetObject. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901123#comment-13901123 ] Hadoop QA commented on YARN-1730: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628926/YARN-1730.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3100//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3100//console This message is automatically generated. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901137#comment-13901137 ] Hadoop QA commented on YARN-1515: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628941/YARN-1515.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3101//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3101//console This message is automatically generated. Ability to dump the container threads and stop the containers in a single RPC - Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: New Feature Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1666: Attachment: YARN-1666.5.patch Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1666) Make admin refreshNodes work across RM failover
[ https://issues.apache.org/jira/browse/YARN-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901142#comment-13901142 ] Xuan Gong commented on YARN-1666: - bq. It doesn't seem to apply against latest trunk anymore. Please update. updated bq. Fix the file-name constants in YarnConfiguration to be consistent. change YARN_SITE_XML_FILE to YARN_SITE_CONFIGURATION_FILE, and YARN_DEFAULT_XML_FILE to YARN_DEFAULT_CONFIGURATION_FILE. bq. FileSystemBasedConfigurationProvider.getConfiguration(): Let's always throw exceptions instead of returning nulls in some cases. Added. bq. LocalConfigurationProvider: We need to first find the location of the XML file in the classpath if it is one of RM_CONFIGURATION_FILES, right? changed bq. In AdminService, where you use new Configuration(), should you use new Configuration(false)? It is fine. Because we will load related configuration later by using addResource(Configuration) which will reload and overwrite all the properties. bq. I think we can simply get rid of the LocalConfigurationProvider instance checks everywhere now. Yes, we can do that. bq. inFile - includesFile and exFile - excludesFile DONE bq. Make both the above as class-fields and use it in refreshNodes method too. DONE bq. disableHostsFileReader() should also use the remote-conf provider? Yes, changed. bq. HostsFileReader: Change the constructor to not need both the file-names as well as the streams I think that we still need it. In the constructor, we will call refresh API which will print out which hosts will be included or excluded from which include file or exclude file. In that case, we need give fileNames. Also, I make several other changes: * Move logic that loads CapacityScheduler.xml from AdminService#refreshQueues to CapacityScheduler#reinitiate(). * Add empty yarn-site.xml and hadoop-policy.xml under hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources to make tests pass. Because right now, LocalConfigurationProvider will load those configuration file from classPath now. * At NodeListManager, we will check whether filename is empty or null. If it is, we will give a null as InputStream. Because both LocalConfigurationProvider or FSBasedConfigurationProvider will throw an exception when they are trying to getInputStream by giving a empty or null fileName. But at NodeListManager, it is allowed to give such value for fileName, it will simply disabled the HostsFileReader. So, before we actually create the inputstream, we'd better to do such checks. Make admin refreshNodes work across RM failover --- Key: YARN-1666 URL: https://issues.apache.org/jira/browse/YARN-1666 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1666.1.patch, YARN-1666.2.patch, YARN-1666.2.patch, YARN-1666.3.patch, YARN-1666.4.patch, YARN-1666.4.patch, YARN-1666.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1515: Attachment: YARN-1515.v04.patch Need to escape the inner class name in test script because it contains '$'. Ability to dump the container threads and stop the containers in a single RPC - Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: New Feature Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)