[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970579#comment-14970579 ] Hadoop QA commented on YARN-4041: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 3s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 66m 37s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 115m 17s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768222/0005-YARN-4041.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9540/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9540/console | This message was automatically generated. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning
[ https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3344: Target Version/s: 2.8.0, 2.7.2 > procfs stat file is not in the expected format warning > -- > > Key: YARN-3344 > URL: https://issues.apache.org/jira/browse/YARN-3344 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jon Bringhurst >Assignee: Ravindra Kumar Naik > Attachments: YARN-3344-trunk.005.patch > > > Although this doesn't appear to be causing any functional issues, it is > spamming our log files quite a bit. :) > It appears that the regex in ProcfsBasedProcessTree doesn't work for all > /proc//stat files. > Here's the error I'm seeing: > {noformat} > "source_host": "asdf", > "method": "constructProcessInfo", > "level": "WARN", > "message": "Unexpected: procfs stat file is not in the expected format > for process with pid 6953" > "file": "ProcfsBasedProcessTree.java", > "line_number": "514", > "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree", > {noformat} > And here's the basic info on process with pid 6953: > {noformat} > [asdf ~]$ cat /proc/6953/stat > 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 > 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 > 2 18446744073709551615 0 0 17 13 0 0 0 0 0 > [asdf ~]$ ps aux|grep 6953 > root 6953 0.0 0.0 200484 23424 ?S21:44 0:00 python2.6 > /export/apps/salt/minion-scripts/module-sync.py > jbringhu 13481 0.0 0.0 105312 872 pts/0S+ 22:13 0:00 grep -i 6953 > [asdf ~]$ > {noformat} > This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI
[ https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4285: Attachment: YARN-4285.003.patch bq. 1) Since ApplicationResourceUsageReport is public API, I suggest to rename parameter, getter/setter name from Perc to Percentage. Same as yarn_protos.proto. Fixed. {quote} 2) if (rmContext.getScheduler() instanceof YarnScheduler) { calc = rmContext.getScheduler().getResourceCalculator(); } rmContext.getScheduler() should be always YarnScheduler, correct? This check maybe not required. {quote} Good catch! Fixed. bq. 3) Is it better to change int percentage to float, for AppInfo and other APIs. Fixed. bq. 4) Since this is not a trivial patch, could you add some tests to verify if SchedulerApplicationAttempt can return percentage properly? Fixed. > Display resource usage as percentage of queue and cluster in the RM UI > -- > > Key: YARN-4285 > URL: https://issues.apache.org/jira/browse/YARN-4285 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4285.001.patch, YARN-4285.002.patch, > YARN-4285.003.patch > > > Currently, we display the memory and vcores allocated to an app in the RM UI. > It would be useful to display the resources consumed as a %of the queue and > the cluster to identify apps that are using a lot of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970608#comment-14970608 ] Hadoop QA commented on YARN-3216: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 45s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 8m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 55s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 61m 12s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 105m 48s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768224/0011-YARN-3216.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9541/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9541/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9541/console | This message was automatically generated. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from failover mode to non-failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970709#comment-14970709 ] Varun Saxena commented on YARN-4127: This failure is due to branch-2.7 patch(as QA tries to apply it on trunk). The QA report for patch on trunk is above and fine. > RM fail with noAuth error if switched from failover mode to non-failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127-branch-2.7.01.patch, YARN-4127.01.patch, > YARN-4127.02.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970719#comment-14970719 ] Junping Du commented on YARN-3223: -- Thanks [~brookz] for updating the patch! I am general fine with your approach also. However, I don't think we should involve a new parameter of boolean value to indicate the node is in decommissioning or not. RMNode itself (RMNode.getState()) is already include the necessary info, so the boolean parameter sounds like redundant. Isn't it? {code} - public NodeUpdateSchedulerEvent(RMNode rmNode) { + public NodeUpdateSchedulerEvent(RMNode rmNode, boolean isDecommissioning) { super(SchedulerEventType.NODE_UPDATE); this.rmNode = rmNode; +this.isDecommissioning = isDecommissioning; } {code} I also notice many changes on test are related to this change. So remove this change could make your patch more concisely. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970726#comment-14970726 ] Junping Du commented on YARN-3223: -- BTW, it sounds like all test changes are related to the parameter update involved above (correct me if I am wrong). I think we need separated test case to cover resource update during NM decommissioning (available resource consistently to be 0). > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970734#comment-14970734 ] Varun Vasudev commented on YARN-4009: - Makes sense. +1 for the latest patch. > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4237) Support additional queries for ATSv2 Web UI
[ https://issues.apache.org/jira/browse/YARN-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970740#comment-14970740 ] Varun Saxena commented on YARN-4237: Thanks [~gtCarrera9] for the review and commit . Thanks [~sjlee0] for the review. > Support additional queries for ATSv2 Web UI > --- > > Key: YARN-4237 > URL: https://issues.apache.org/jira/browse/YARN-4237 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: YARN-2928 > > Attachments: YARN-4237-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.
[ https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970750#comment-14970750 ] Steve Loughran commented on YARN-4278: -- Bikas, SLIDER-947 suggests exactly that as a workaround; my concerns there relate to token expiry and the like...we'll be using a delegation token unless we have a keytab. FWIW I'm using a yarn client in SPARK-1537 without problems so far regarding network load, that's why I've suggested it should be an option on register/resync. Given that clients can ask for it, and AM registration is, excluding failures, the same as the #of client launches, I doubt it'd be that more expensive. Except: what would happen on any failover? > On AM registration, response should include cluster Nodes report on demanded > by registration request. > -- > > Key: YARN-4278 > URL: https://issues.apache.org/jira/browse/YARN-4278 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > From the yarn-dev mailing list discussion thread > [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E] > > [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E] > > Slider required to know about cluster nodes details for providing support for > affinity/anti-affinity on containers. > Current behavior : During life span of application , updatedNodes are sent in > allocate request only if there are any change like added/removed/'state > change' in the nodes. Otherwise cluster nodes not updated to AM. > One of the approach thought by [~ste...@apache.org] is while AM registration > let response hold the cluster nodes report -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970794#comment-14970794 ] Junping Du commented on YARN-3224: -- Hi [~sunilg], sorry for coming late on this. Sounds like this JIRA depends on YARN-3784 to go first. Isn't it? If so, I will look at that one first. Thanks for your patient! > Notify AM with containers (on decommissioning node) could be preempted after > timeout. > - > > Key: YARN-3224 > URL: https://issues.apache.org/jira/browse/YARN-3224 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3224.patch, 0002-YARN-3224.patch > > > We should leverage YARN preemption framework to notify AM that some > containers will be preempted after a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970802#comment-14970802 ] Junping Du commented on YARN-3784: -- bq. AM protocol changes are handled in this ticket along with CS change. I feel its better to track FairScheduler changes separately and it can be linked to this. The plan sounds reasonable. YARN-3224 depends on this JIRA. [~sunilg], I am reviewing your patch now but it sounds like the patch is already stale. Can you update the patch with latest trunk? Thanks! > Indicate preemption timout along with the list of containers to AM > (preemption message) > --- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which > are marked for preemption. Introducing a timeout duration also along with > this container list so that AM can know how much time it will get to do a > graceful shutdown to its containers (assuming one of preemption policy is > loaded in AM). > This will help in decommissioning NM scenarios, where NM will be > decommissioned after a timeout (also killing containers on it). This timeout > will be helpful to indicate AM that those containers can be killed by RM > forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.
[ https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970840#comment-14970840 ] Bikas Saha commented on YARN-4278: -- Client can ask for it but IMO, no clients actually do. Hence we probably don't have enough evidence to suggest that it might work in practice, specially at large deployments like Yahoo. > On AM registration, response should include cluster Nodes report on demanded > by registration request. > -- > > Key: YARN-4278 > URL: https://issues.apache.org/jira/browse/YARN-4278 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > From the yarn-dev mailing list discussion thread > [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E] > > [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E] > > Slider required to know about cluster nodes details for providing support for > affinity/anti-affinity on containers. > Current behavior : During life span of application , updatedNodes are sent in > allocate request only if there are any change like added/removed/'state > change' in the nodes. Otherwise cluster nodes not updated to AM. > One of the approach thought by [~ste...@apache.org] is while AM registration > let response hold the cluster nodes report -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference
Akira AJISAKA created YARN-4294: --- Summary: [JDK8] Fix javadoc errors caused by wrong reference Key: YARN-4294 URL: https://issues.apache.org/jira/browse/YARN-4294 Project: Hadoop YARN Issue Type: Bug Components: build, documentation Reporter: Akira AJISAKA Priority: Blocker {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. {noformat} [ERROR] /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: error: reference not found [ERROR] * @see ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) [ERROR] ^ [ERROR] /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: error: reference not found [ERROR] * @see ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) [ERROR] ^ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned YARN-4294: --- Assignee: Akira AJISAKA > [JDK8] Fix javadoc errors caused by wrong reference > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970872#comment-14970872 ] Sunil G commented on YARN-3784: --- Thank you [~djp] Sure, I will rebase the patch soon. > Indicate preemption timout along with the list of containers to AM > (preemption message) > --- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which > are marked for preemption. Introducing a timeout duration also along with > this container list so that AM can know how much time it will get to do a > graceful shutdown to its containers (assuming one of preemption policy is > loaded in AM). > This will help in decommissioning NM scenarios, where NM will be > decommissioned after a timeout (also killing containers on it). This timeout > will be helpful to indicate AM that those containers can be killed by RM > forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-4294: Summary: [JDK8] Fix javadoc errors caused by wrong reference and illegal tag (was: [JDK8] Fix javadoc errors caused by wrong reference) > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-4294: Attachment: YARN-4294.00.patch > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: YARN-4294.00.patch > > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970909#comment-14970909 ] Junping Du commented on YARN-4132: -- Thanks for bring out the problem, [~lichangleo]! I agree that we may want different retry times, intervals (or even retry policy) for different consumer of RMProxy. However, the current patch way - have a separated property which override a general property in config at runtime sounds a little tricky here. We should think if there are some better way, e.g. passing the parameters in creating RMProxy which will take these parameters as priority in creating retry policies before taking property in config as default values. Thoughts? > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970945#comment-14970945 ] Jason Lowe commented on YARN-4041: -- +1 for the latest patch, will commit this later today if there are no objections. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971020#comment-14971020 ] Hadoop QA commented on YARN-4294: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 38s | The applied patch generated 3 new checkstyle issues (total was 36, now 37). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 19s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 59m 25s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 108m 7s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768280/YARN-4294.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9544/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9544/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9544/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9544/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9544/console | This message was automatically generated. > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: YARN-4294.00.patch > > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2902: --- Attachment: YARN-2902.08.patch > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2902: --- Attachment: (was: YARN-2902.08.patch) > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2902: --- Attachment: YARN-2902.08.patch > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM
[ https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971085#comment-14971085 ] Pradeep Subrahmanion commented on YARN-1565: Thank you for your review comments. I will update the patch with documentation. > Add a way for YARN clients to get critical YARN system properties from the RM > - > > Key: YARN-1565 > URL: https://issues.apache.org/jira/browse/YARN-1565 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Steve Loughran > Attachments: YARN-1565-001.patch, YARN-1565-002.patch, > YARN-1565-003.patch > > > If you are trying to build up an AM request, you need to know > # the limits of memory, core &c for the chosen queue > # the existing YARN classpath > # the path separator for the target platform (so your classpath comes out > right) > # cluster OS: in case you need some OS-specific changes > The classpath can be in yarn-site.xml, but a remote client may not have that. > The site-xml file doesn't list Queue resource limits, cluster OS or the path > separator. > A way to query the RM for these values would make it easier for YARN clients > to build up AM submissions with less guesswork and client-side config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971092#comment-14971092 ] Varun Saxena commented on YARN-2902: [~jlowe], kindly review. Sorry could not upload the patch earlier due to bandwidth issues. But I think its still on track for 2.7.2 Coming to the patch, the patch handles the deletion in NM itself. At the time of processing container cleanup event(after killing of container), we will transition downloading resource to FAILED. And after localizer exits, deletion will be done in finally block of LocalizerRunner, as per suggestion given above. There is one presumably rare scenario where this deletion wont work. That is if NM recovery is not enabled and the deletion task is scheduled. But the deletion task is put in the deletion service's executor queue because all the 4 threads in deletion service's executor(NM delete threads) are occupied. If NM goes down before this task is taken up, the downloading resources wont be deleted. If you want this handled, we can attempt deletion in container localizer too. I already have code for it(in earlier patches). But do we need to handle this rare case ? Let me know. BTW, patch does not apply cleanly on branch-2.7 so will update that patch once trunk patch is ok to go in. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971098#comment-14971098 ] Akira AJISAKA commented on YARN-4294: - bq. -1 checkstyle 1m 38s The applied patch generated 3 new checkstyle issues (total was 36, now 37). The warning is caused by unused imports but the imports are actually used for compiling javadoc. > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: YARN-4294.00.patch > > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI
[ https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971097#comment-14971097 ] Hadoop QA commented on YARN-4285: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 26m 8s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 9m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 53s | The applied patch generated 1 new checkstyle issues (total was 10, now 11). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 50s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 34m 55s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 51s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 327m 39s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 428m 42s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerApplicationAttempt | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager | | Timed out tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | | org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer | | | org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl | | | org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | org.apache.h
[jira] [Updated] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4169: Attachment: YARN-4169.v1.004.patch Hi [~wangda], haved rebased the patch , pls check . > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, > YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart
[ https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971124#comment-14971124 ] Varun Saxena commented on YARN-4000: Thanks [~jianhe] for the commit and review. Thanks [~leftnoteasy], [~kasha] and [~jlowe] for the review. > RM crashes with NPE if leaf queue becomes parent queue during restart > - > > Key: YARN-4000 > URL: https://issues.apache.org/jira/browse/YARN-4000 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Fix For: 2.8.0, 2.7.2 > > Attachments: YARN-4000-branch-2.7.01.patch, YARN-4000.01.patch, > YARN-4000.02.patch, YARN-4000.03.patch, YARN-4000.04.patch, > YARN-4000.05.patch, YARN-4000.06.patch > > > This is a similar situation to YARN-2308. If an application is active in > queue A and then the RM restarts with a changed capacity scheduler > configuration where queue A becomes a parent queue to other subqueues then > the RM will crash with a NullPointerException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971133#comment-14971133 ] Hadoop QA commented on YARN-2902: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 3s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 9m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 44s | The applied patch generated 6 new checkstyle issues (total was 235, now 198). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 9m 24s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 53m 30s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768301/YARN-2902.08.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9546/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9546/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9546/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9546/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9546/console | This message was automatically generated. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4212) FairScheduler: Parent queues with 'Fair' policy should compute shares of all resources for its children during a recompute
[ https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971152#comment-14971152 ] Karthik Kambatla commented on YARN-4212: Patch looks fairly straight-forward. Comments: # Changes to FairScheduler.java seem spurious. Can we leave them out? # Test ## instead of editing a DRF-preemption test, can we add a new test - {{testAllocationWithMixedHierarchy}} ## Can we try multiple hierarchies outside of just root. May be, set root - fair, root.q1 - drf, root.q11 - fair, root.q2 - fair, root.q21 - drf, and verify jobs submitted to root.q11 and root.q21 get resources. Do we want to throw in FIFO as well somewhere in there? > FairScheduler: Parent queues with 'Fair' policy should compute shares of all > resources for its children during a recompute > -- > > Key: YARN-4212 > URL: https://issues.apache.org/jira/browse/YARN-4212 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > Labels: fairscheduler > Attachments: YARN-4212.1.patch > > > The Fair Scheduler, while performing a {{recomputeShares()}} during an > {{update()}} call, uses the parent queues policy to distribute shares to its > children. > If the parent queues policy is 'fair', it only computes weight for memory and > sets the vcores fair share of its children to 0. > Assuming a situation where we have 1 parent queue with policy 'fair' and > multiple leaf queues with policy 'drf', Any app submitted to the child queues > with vcore requirement > 1 will always be above fairshare, since during the > recomputeShare process, the child queues were all assigned 0 for fairshare > vcores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971154#comment-14971154 ] Jason Lowe commented on YARN-4280: -- It's a sticky problem. The problem with doing the resource check is that it can prevent the reservation from being fulfilled indefinitely. For example, consider a situation like this: * root queue (near 100% utilization) ** parent queue P (near max capacity) *** leaf queue A (well under capacity) *** leaf queue B (almost all of P's utilization) ** leaf queue C (the remainder of root - P) We have an application X in queue A that needs a large resource. If we do a limit check against P's max capacity or the root's max capacity, it won't fit. If we don't make the reservation, then the app in A could be indefinitely postponed. So let's say we go ahead and let the reservation occur. If the resource to fill that reservation was freed from within the P queue hierarchy then we're OK. If it's not, then we cannot fulfill the reservation otherwise we run over P's max capacity. So in the latter case, do we leave the reservation? Does this in turn prevent apps in C from making progress because app X's reservations start locking down the cluster, waiting for the apps in queue B to free up resources? Offhand I don't have a great answer for how to tackle the problem. Seems like either we need to start locking down parts of the cluster and potentially leave resources fallow, even for other queues outside of P, to make sure app X will eventually get something or we keep app X from reserving and leave it vulnerable to indefinite postponement despite containers churning in queue B. It's like we need to make a reservation _within_ the P queue hierarchy for this scenario, to make sure queue B isn't allowed to grab more resources while app X is waiting, but not sure that's right either. > CapacityScheduler reservations may not prevent indefinite postponement on a > busy cluster > > > Key: YARN-4280 > URL: https://issues.apache.org/jira/browse/YARN-4280 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.6.1, 2.8.0, 2.7.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > > Consider the following scenario: > There are 2 queues A(25% of the total capacity) and B(75%), both can run at > total cluster capacity. There are 2 applications, appX that runs on Queue A, > always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 > GB containers. > The user limit is high enough for the application to reach 100% of the > cluster resource. > appX is running at total cluster capacity, full with 1G containers releasing > only one container at a time. appY comes in with a request of 2GB container > but only 1 GB is free. Ideally, since appY is in the underserved queue, it > has higher priority and should reserve for its 2 GB request. Since this > request puts the alloc+reserve above total capacity of the cluster, > reservation is not made. appX comes in with a 1GB request and since 1GB is > still available, the request is allocated. > This can continue indefinitely causing priority inversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-2913: -- Issue Type: Improvement (was: Bug) > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971168#comment-14971168 ] Hadoop QA commented on YARN-2902: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 23s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 16s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 6 new checkstyle issues (total was 235, now 198). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 9m 5s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 53m 48s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768305/YARN-2902.08.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9547/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9547/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9547/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9547/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9547/console | This message was automatically generated. > Killing a container that is localizing can orphan resources in the > DOWNLOADING state > > > Key: YARN-2902 > URL: https://issues.apache.org/jira/browse/YARN-2902 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > Attachments: YARN-2902.002.patch, YARN-2902.03.patch, > YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, > YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.patch > > > If a container is in the process of localizing when it is stopped/killed then > resources are left in the DOWNLOADING state. If no other container comes > along and requests these resources they linger around with no reference > counts but aren't cleaned up during normal cache cleanup scans since it will > never delete resources in the DOWNLOADING state even if their reference count > is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971176#comment-14971176 ] Nathan Roberts commented on YARN-4287: -- Thanks for the comments. You're right that the logic can be simplified in that area. Let me do that and post a followup patch. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971202#comment-14971202 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-trunk-Commit #8695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8695/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/CHANGES.txt > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971201#comment-14971201 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-trunk-Commit #8695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8695/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971213#comment-14971213 ] Steve Loughran commented on YARN-4294: -- +1. If you can track down the patch which broke this, can you link this up as "caused by". Just to keep that "what-broke-what" history up to date > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: YARN-4294.00.patch > > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4294) [JDK8] Fix javadoc errors caused by wrong reference and illegal tag
[ https://issues.apache.org/jira/browse/YARN-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971220#comment-14971220 ] Tsuyoshi Ozawa commented on YARN-4294: -- +1 > [JDK8] Fix javadoc errors caused by wrong reference and illegal tag > --- > > Key: YARN-4294 > URL: https://issues.apache.org/jira/browse/YARN-4294 > Project: Hadoop YARN > Issue Type: Bug > Components: build, documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: YARN-4294.00.patch > > > {{mvn package -Pdist -Dtar -DskipTests}} fails on JDK8 by illegal javadoc. > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java:33: > error: reference not found > [ERROR] * @see > ApplicationClientProtocol#failApplicationAttempt(FailApplicationAttemptRequest) > [ERROR] ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status
[ https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971229#comment-14971229 ] Ram Venkatesh commented on YARN-1402: - Unfortunately, the addition of two new abstract methods to ApplicationReport introduced by this patch is a breaking change to a Public Stable API. Can we revert and add stub implementations instead that can be overridden? Thanks! > Related Web UI, CLI changes on exposing client API to check log aggregation > status > -- > > Key: YARN-1402 > URL: https://issues.apache.org/jira/browse/YARN-1402 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1402.1.patch, YARN-1402.2.patch, > YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971242#comment-14971242 ] Tsuyoshi Ozawa commented on YARN-3528: -- [~brahmareddy] thank you for taking this issue. could you update 008 patch again? > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-branch2.patch, > YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3738: Attachment: YARN-3738-v3.patch Retriggering jenkins with same patch > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4279) Mark ApplicationId and ApplicationAttemptId static methods as @Public, @Unstable
[ https://issues.apache.org/jira/browse/YARN-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971310#comment-14971310 ] Steve Loughran commented on YARN-4279: -- No tests needed. Any reviewers? > Mark ApplicationId and ApplicationAttemptId static methods as @Public, > @Unstable > > > Key: YARN-4279 > URL: https://issues.apache.org/jira/browse/YARN-4279 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: YARN-4279-001.patch > > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > The classes {{ApplicationId}} and {{ApplicationAttemptId}} both have > {{newInstance()}} methods tagged as {{@Private}}. Yet they are useful in > testing, as the alternative is to create and configure the PBImpl classes > -which are significantly more private. > The fact that mapreduce's {{MRBuilderUtils}} uses one of the methods shows > that YARN apps do need access to the methods. > Marking them as public would make it clear that other YARN apps were using > them for their production or test code, rather than today, where they are > used and depended on, yet without the YARN team's knowledge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971335#comment-14971335 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #574 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/574/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971334#comment-14971334 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #574 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/574/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4264) In-VM test ATS instances fail with metrics already registered
[ https://issues.apache.org/jira/browse/YARN-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-4264: - Priority: Blocker (was: Major) > In-VM test ATS instances fail with metrics already registered > - > > Key: YARN-4264 > URL: https://issues.apache.org/jira/browse/YARN-4264 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Blocker > > Testing my SPARK-1537 code against branch-2 triggers stack traces due to > failed attempts to re-register metrics. This is with code which works against > 2.7.1, so it's a regression. > Either the timeline server needs to unregister its metrics on shutdown, or > ATS adds an option to disable metrics for test purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow
[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971356#comment-14971356 ] Sangjin Lee commented on YARN-4284: --- [~jlowe], could you kindly let me know what you think? Thanks! > condition for AM blacklisting is too narrow > --- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4284.001.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status
[ https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971354#comment-14971354 ] Junping Du commented on YARN-1402: -- Thanks for bring this good point, [~venkateshrin]! This is something worth to discuss. For now, adding some new APIs (fields) in records API is not recognized as incompatible change on public interface as old version client can still talk to new version server due to our PB implementations. However, this could break users' inherited classes if any. May be we should see this is an incompatible case in class level. Thoughts? > Related Web UI, CLI changes on exposing client API to check log aggregation > status > -- > > Key: YARN-1402 > URL: https://issues.apache.org/jira/browse/YARN-1402 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-1402.1.patch, YARN-1402.2.patch, > YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971355#comment-14971355 ] Hadoop QA commented on YARN-4169: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 29s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 23s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 1 new checkstyle issues (total was 30, now 31). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 21s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 57s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 60m 30s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 115m 42s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-common | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768309/YARN-4169.v1.004.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 35a303d | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9548/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9548/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9548/console | This message was automatically generated. > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical >
[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3528: --- Attachment: YARN-3528-009.patch > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-009.patch, > YARN-3528-branch2.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971363#comment-14971363 ] Brahma Reddy Battula commented on YARN-3528: [~ozawa] thanks a lot for taking a look into this issue.. rebased patch..Kindly review.. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-009.patch, > YARN-3528-branch2.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971365#comment-14971365 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #587 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/587/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java * hadoop-yarn-project/CHANGES.txt > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4221) Store user in app to flow table
[ https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971403#comment-14971403 ] Sangjin Lee commented on YARN-4221: --- I've reviewed the patch, and it LGTM. I'll let others comment on this today before I commit. Thanks! > Store user in app to flow table > --- > > Key: YARN-4221 > URL: https://issues.apache.org/jira/browse/YARN-4221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4221-YARN-2928.01.patch, > YARN-4221-YARN-2928.02.patch, YARN-4221-YARN-2928.03.patch > > > We should store user as well in in app to flow table. > For queries where user is not supplied and flow context can be retrieved from > app to flow table, we should take the user from app to flow table instead of > considering UGI as default user. > This is as per discussion on YARN-3864 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4292: -- Attachment: 0001-YARN-4292.patch Attaching an initial working in progress patch. I will share an updated patch with more test cases soon. I am considering both NodeResourceUtilization and ContainerAggreatedResourceUtilization as its available from node manager. Hope its fine. > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971491#comment-14971491 ] Hudson commented on YARN-4009: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1310 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1310/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971492#comment-14971492 ] Hudson commented on YARN-2913: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1310 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1310/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/CHANGES.txt > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow
[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971510#comment-14971510 ] Jason Lowe commented on YARN-4284: -- Sorry for arriving a bit late. I agree that DISKS_FAILED is too narrow of a failure to use as a trigger, and it's probably better to err on the side of covering too many failure reasons than not enough. Patch looks pretty good, but I agree that we should not be blacklisting on preempted containers. Those have nothing to do with the AM or the node, and we shouldn't make the reschedule of a preempted AM more difficult as a result. There should be a test for that scenario. > condition for AM blacklisting is too narrow > --- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4284.001.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM
[ https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Subrahmanion updated YARN-1565: --- Attachment: YARN-1565-004.patch Attached latest patch. Made 'classPath' public in SystemPropertyInfo. Added documentation. > Add a way for YARN clients to get critical YARN system properties from the RM > - > > Key: YARN-1565 > URL: https://issues.apache.org/jira/browse/YARN-1565 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Steve Loughran > Attachments: YARN-1565-001.patch, YARN-1565-002.patch, > YARN-1565-003.patch, YARN-1565-004.patch > > > If you are trying to build up an AM request, you need to know > # the limits of memory, core &c for the chosen queue > # the existing YARN classpath > # the path separator for the target platform (so your classpath comes out > right) > # cluster OS: in case you need some OS-specific changes > The classpath can be in yarn-site.xml, but a remote client may not have that. > The site-xml file doesn't list Queue resource limits, cluster OS or the path > separator. > A way to query the RM for these values would make it easier for YARN clients > to build up AM submissions with less guesswork and client-side config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971545#comment-14971545 ] Hadoop QA commented on YARN-3738: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | javac | 8m 51s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 11m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 59s | The applied patch generated 2 new checkstyle issues (total was 189, now 189). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 59m 49s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 104m 48s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768332/YARN-3738-v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 934d96a | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/9549/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9549/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9549/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9549/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9549/console | This message was automatically generated. > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI
[ https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4285: Attachment: YARN-4285.004.patch The failing tests are due to a deadlock caused by calling getQueueInfo within the getResourceUsageReport() function in SchedulerApplicationAttempt. After speaking with [~leftnoteasy] offline, he suggested that there's no need for the getQueueInfo call to be synchronized. Uploaded a new patch with the fix. > Display resource usage as percentage of queue and cluster in the RM UI > -- > > Key: YARN-4285 > URL: https://issues.apache.org/jira/browse/YARN-4285 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4285.001.patch, YARN-4285.002.patch, > YARN-4285.003.patch, YARN-4285.004.patch > > > Currently, we display the memory and vcores allocated to an app in the RM UI. > It would be useful to display the resources consumed as a %of the queue and > the cluster to identify apps that are using a lot of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-4287: - Attachment: YARN-4287-v3.patch V3 of patch. Thanks again for the comments. bq. RACK_LOCALITY_EXTRA_DELAY -> RACK_LOCALITY_DELAY, same as configuration property name (rack-locality-delay) Done - changed to absolute instead of relative to nodeLocality bq. Do you think if is it a good idea to separate old rack-locality-delay computation (using getLocalityWaitFactor) and new rack-locality-delay config? Now rack-locality-delay = min(old-computed-delay, new-specified-delay), since the getLocalityWaitFactor has some flaws, I think we can make this configurable so user can choose to use specified or computed. I simplified the code a little in this area to make it easier to see where the computed-locality-delay is used. I didn't separate them in this version of the patch because I still want to be able to specify rack-locality-delay BUT have the computed delay take effect when an application is not asking for locality OR is really small. This is a very important capability for at least our use cases. My opinion is that we shouldn't make it configurable to get the old behavior. I can be convinced otherwise, if that's what folks want. Here's my reasoning: - This is a behavior change, but I can't think of any good cases where someone would prefer the old behavior to the new. Let me know if you can think of some. - Node locality might go down a little bit but I think it's quite unlikely this will happen in practice. As soon as it sees a node-local assignment, it immediately goes back to waiting for node-locality - so it's quite hard to only get rack locality when there is node locality to be had. - Rack locality will go up because previously the computedDelay used for OFFSWITCH would actually kick-in prior to a rack-local opportunity, which wasn't ideal. I would think this would offset any node locality we lost. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971567#comment-14971567 ] Hadoop QA commented on YARN-3528: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 10m 54s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 10m 40s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 28s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 17s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 2m 11s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 45s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 8m 47s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 9m 30s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768340/YARN-3528-009.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / eb6379c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9550/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9550/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9550/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9550/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9550/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9550/console | This message was automatically generated. > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, > YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528-009.patch, > YARN-3528-branch2.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971576#comment-14971576 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #588 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/588/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3738: - Attachment: YARN-3738-v4.patch Updating patch to fix javac warning and checkstyle issue. There is still one checkstyle issue that can resolved, which is that CapacityScheduler.java exceeds 2000 lines. > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738-v4.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4283) hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/YARN-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemanja Matkovic resolved YARN-4283. Resolution: Duplicate > hadoop-yarn Avoid unsafe split and append on fields that might be IPv6 > literals > --- > > Key: YARN-4283 > URL: https://issues.apache.org/jira/browse/YARN-4283 > Project: Hadoop YARN > Issue Type: Task >Reporter: Nemanja Matkovic > Labels: ipv6 > Original Estimate: 48h > Remaining Estimate: 48h > > hadoop-yarn part of HADOOP-12122 task -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971665#comment-14971665 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2520 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2520/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971666#comment-14971666 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2520 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2520/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4286) yarn-default.xml has a typo: yarn-nodemanager.local-dirs should be yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/YARN-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Cote resolved YARN-4286. --- Resolution: Duplicate Turns out this is already reported and a patch is available, so I'm closing this one as duplicate. > yarn-default.xml has a typo: yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs > -- > > Key: YARN-4286 > URL: https://issues.apache.org/jira/browse/YARN-4286 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Dustin Cote >Priority: Trivial > Labels: newbie > > In the yarn-default.xml, the property yarn.nodemanager.local-dirs is > referenced as yarn-nodemanager.local-dirs in multiple places. This should be > a straightforward fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3784: -- Attachment: 0003-YARN-3784.patch Thank you [~djp] and [~leftnoteasy] rebasing the patch against trunk. > Indicate preemption timout along with the list of containers to AM > (preemption message) > --- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, > 0003-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which > are marked for preemption. Introducing a timeout duration also along with > this container list so that AM can know how much time it will get to do a > graceful shutdown to its containers (assuming one of preemption policy is > loaded in AM). > This will help in decommissioning NM scenarios, where NM will be > decommissioned after a timeout (also killing containers on it). This timeout > will be helpful to indicate AM that those containers can be killed by RM > forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4229) Support max-am-resource-percentage per label partition for User
[ https://issues.apache.org/jira/browse/YARN-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-4229. --- Resolution: Won't Fix marking this as Won't Fix as the same is already handled in YARN-3216 > Support max-am-resource-percentage per label partition for User > --- > > Key: YARN-4229 > URL: https://issues.apache.org/jira/browse/YARN-4229 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > > Similar to YARN-3216, we need to support per-user-per-partition level max AM > resource percentage also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971695#comment-14971695 ] Sunil G commented on YARN-3224: --- Yes. This issue is dependent on YARN-3784, and I have rebased patch there. Thank you. > Notify AM with containers (on decommissioning node) could be preempted after > timeout. > - > > Key: YARN-3224 > URL: https://issues.apache.org/jira/browse/YARN-3224 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3224.patch, 0002-YARN-3224.patch > > > We should leverage YARN preemption framework to notify AM that some > containers will be preempted after a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971701#comment-14971701 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #530 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/530/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971700#comment-14971700 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #530 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/530/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971714#comment-14971714 ] Hudson commented on YARN-4009: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2466 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2466/]) YARN-4009. CORS support for ResourceManager REST API. ( Varun Vasudev (jeagles: rev f8adeb712dc834c27cec15c04a986f2f635aba83) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml * hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/HttpCrossOriginFilterInitializer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/CrossOriginFilter.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/http/TestCrossOriginFilter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestHttpCrossOriginFilterInitializer.java > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, > YARN-4009.006.patch, YARN-4009.007.patch, YARN-4009.8.patch, > YARN-4009.LOGGING.patch, YARN-4009.LOGGING.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2913) Fair scheduler should have ability to set MaxResourceDefault for each queue
[ https://issues.apache.org/jira/browse/YARN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971715#comment-14971715 ] Hudson commented on YARN-2913: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2466 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2466/]) YARN-2913. Fair scheduler should have ability to set MaxResourceDefault (mingma: rev 934d96a334598fcf0e5aba2043ff539469025f69) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java > Fair scheduler should have ability to set MaxResourceDefault for each queue > --- > > Key: YARN-2913 > URL: https://issues.apache.org/jira/browse/YARN-2913 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.8.0 > > Attachments: YARN-2913.v1.patch, YARN-2913.v2.patch, > YARN-2913.v3.patch, YARN-2913.v4.patch, YARN-2913.v5.patch > > > Queues that are created on the fly have the max resource of the entire > cluster. Fair Scheduler should have a default maxResource to control the > maxResource of those queues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4284) condition for AM blacklisting is too narrow
[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971759#comment-14971759 ] Sangjin Lee commented on YARN-4284: --- Thanks. I agree that PREEMPTED is unequivocally not a node issue. I'll update the patch shortly. > condition for AM blacklisting is too narrow > --- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4284.001.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971817#comment-14971817 ] Anubhav Dhoot commented on YARN-3738: - +1 pending jenkins > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738-v4.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI
[ https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971846#comment-14971846 ] Hadoop QA commented on YARN-4285: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 29m 49s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 11m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 33s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 56s | The applied patch generated 1 new checkstyle issues (total was 10, now 11). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 49s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 8m 9s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 11s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 3s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 3m 7s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 83m 32s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore | | Failed build | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768364/YARN-4285.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 600ad7b | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9552/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9552/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9552/console | This message was automatically generated. > Display resource usage as percentage of queue and cluster in the RM UI > -- > > Key: YARN-4285 > URL: https://issues.apache.org/jira/browse/YARN-4285 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4285.001.patch, YARN-4285.002.patch, > YARN-4285.003.patch, YARN-4285.004.patch > > > Currently, we display the memory and vcores allocated to an app in the RM UI. > It would be useful to display the resources consumed as a %of the queue and > the cluster to identify apps that are using a lot of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971852#comment-14971852 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-trunk-Commit #8697 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8697/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM
[ https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971905#comment-14971905 ] Hadoop QA commented on YARN-1565: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 24m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 37s | Site still builds. | | {color:red}-1{color} | checkstyle | 1m 41s | The applied patch generated 1 new checkstyle issues (total was 212, now 212). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 29s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 62m 57s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 121m 18s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768358/YARN-1565-004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 600ad7b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9551/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9551/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9551/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9551/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9551/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9551/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9551/console | This message was automatically generated. > Add a way for YARN clients to get critical YARN system properties from the RM > - > > Key: YARN-1565 > URL: https://issues.apache.org/jira/browse/YARN-1565 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Steve Loughran > Attachments: YARN-1565-001.patch, YARN-1565-002.patch, > YARN-1565-003.patch, YARN-1565-004.patch > > > If you are trying to build up an AM request, you need to know > # the limits of memory, core &c for the chosen queue > # the existing YARN classpath > # the path separator for the target platform (so your classpath comes out > right) > # cluster OS: in case you need some OS-specific changes > The classpath can be in yarn-site.xml, but a remote client may not have that. > The site-xml file doesn't list Queue resource limits, cluster OS or the path > separator. > A way to query the RM for these values would make it easier for YARN clients > to build up AM submissions with less guesswork and client-side config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971966#comment-14971966 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #576 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/576/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971998#comment-14971998 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #589 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/589/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972103#comment-14972103 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1312 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1312/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972110#comment-14972110 ] Wangda Tan commented on YARN-4280: -- [~jlowe], agree, this is a hard problem to solve, allowing reservation beyond queue's max capacity may bring more problems. As I mentioned above, maybe preemption is one solution, preemption's target is to make under utilized queue can eventually get something. We can minimize its side effect by tuning its parameters. (Such as reduce number of containers preempted each round, and disable preemption for some critical queues, etc.) > CapacityScheduler reservations may not prevent indefinite postponement on a > busy cluster > > > Key: YARN-4280 > URL: https://issues.apache.org/jira/browse/YARN-4280 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.6.1, 2.8.0, 2.7.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > > Consider the following scenario: > There are 2 queues A(25% of the total capacity) and B(75%), both can run at > total cluster capacity. There are 2 applications, appX that runs on Queue A, > always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 > GB containers. > The user limit is high enough for the application to reach 100% of the > cluster resource. > appX is running at total cluster capacity, full with 1G containers releasing > only one container at a time. appY comes in with a request of 2GB container > but only 1 GB is free. Ideally, since appY is in the underserved queue, it > has higher priority and should reserve for its 2 GB request. Since this > request puts the alloc+reserve above total capacity of the cluster, > reservation is not made. appX comes in with a 1GB request and since 1GB is > still available, the request is allocated. > This can continue indefinitely causing priority inversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972117#comment-14972117 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2521 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2521/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels
[ https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972135#comment-14972135 ] Wangda Tan commented on YARN-3216: -- [~sunilg], Thanks for update, the latest patch looks good to me, will commit in a few days if no objections. > Max-AM-Resource-Percentage should respect node labels > - > > Key: YARN-3216 > URL: https://issues.apache.org/jira/browse/YARN-3216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, > 0003-YARN-3216.patch, 0004-YARN-3216.patch, 0005-YARN-3216.patch, > 0006-YARN-3216.patch, 0007-YARN-3216.patch, 0008-YARN-3216.patch, > 0009-YARN-3216.patch, 0010-YARN-3216.patch, 0011-YARN-3216.patch > > > Currently, max-am-resource-percentage considers default_partition only. When > a queue can access multiple partitions, we should be able to compute > max-am-resource-percentage based on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4284) condition for AM blacklisting is too narrow
[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4284: -- Attachment: YARN-4284.002.patch v.2 patch posted. Added PREEMPTED as a condition for not blacklisting nodes. Also added a unit test for it. > condition for AM blacklisting is too narrow > --- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4284.001.patch, YARN-4284.002.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3738: - Attachment: YARN-3738-v4.patch Retriggering jenkins with same patch > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738-v4.patch, YARN-3738-v4.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4285) Display resource usage as percentage of queue and cluster in the RM UI
[ https://issues.apache.org/jira/browse/YARN-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972161#comment-14972161 ] Wangda Tan commented on YARN-4285: -- Tried this patch locally, looks great! Also tried to sort applications based on queue percentage, cluster percentage, all work as expected. Will commit shortly if no objections. > Display resource usage as percentage of queue and cluster in the RM UI > -- > > Key: YARN-4285 > URL: https://issues.apache.org/jira/browse/YARN-4285 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4285.001.patch, YARN-4285.002.patch, > YARN-4285.003.patch, YARN-4285.004.patch > > > Currently, we display the memory and vcores allocated to an app in the RM UI. > It would be useful to display the resources consumed as a %of the queue and > the cluster to identify apps that are using a lot of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3738) Add support for recovery of reserved apps (running under dynamic queues) to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972167#comment-14972167 ] Hadoop QA commented on YARN-3738: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 24m 8s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 12m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 15m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 43s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 1 new checkstyle issues (total was 189, now 188). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 55s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 65m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 125m 46s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768376/YARN-3738-v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 15eb84b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9553/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9553/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9553/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9553/console | This message was automatically generated. > Add support for recovery of reserved apps (running under dynamic queues) to > Capacity Scheduler > -- > > Key: YARN-3738 > URL: https://issues.apache.org/jira/browse/YARN-3738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3738-v2.patch, YARN-3738-v3.patch, > YARN-3738-v3.patch, YARN-3738-v4.patch, YARN-3738-v4.patch, YARN-3738.patch > > > YARN-3736 persists the current state of the Plan to the RMStateStore. This > JIRA covers recovery of the Plan, i.e. dynamic reservation queues with > associated apps as part Capacity Scheduler failover mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972168#comment-14972168 ] Wangda Tan commented on YARN-4169: -- Thanks for explanations, [~Naganarasimha]. Fix looks good and safe. Not sure if failed tests relate to changes, rekicking Jenkins. > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, > YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4169: Attachment: YARN-4162.v2.005.patch [~wangda], Test failures seems to be not related to the patch, Findbugs showed here is related to the YARN-1897 (can take care in this patch if ok as its a small fix !) reuploading the patch after white space issues (not from modified lines again ! ) and check style issue > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4162.v2.005.patch, YARN-4169.v1.001.patch, > YARN-4169.v1.002.patch, YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972183#comment-14972183 ] Naganarasimha G R commented on YARN-4169: - Oops dint see this message, triggered by new patch ! > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4162.v2.005.patch, YARN-4169.v1.001.patch, > YARN-4169.v1.002.patch, YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972182#comment-14972182 ] Naganarasimha G R commented on YARN-4169: - Oops dint see this message, triggered by new patch ! > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4162.v2.005.patch, YARN-4169.v1.001.patch, > YARN-4169.v1.002.patch, YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1510) Make NMClient support change container resources
[ https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972180#comment-14972180 ] Wangda Tan commented on YARN-1510: -- Thanks for update [~mding]. Could you add comments to deprecated class to mention what's the new methods should to use? > Make NMClient support change container resources > > > Key: YARN-1510 > URL: https://issues.apache.org/jira/browse/YARN-1510 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1510-YARN-1197.1.patch, > YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch, YARN-1510.4.patch, > YARN-1510.5.patch > > > As described in YARN-1197, YARN-1449, we need add API in NMClient to support > 1) sending request of increase/decrease container resource limits > 2) get succeeded/failed changed containers response from NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
[ https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972181#comment-14972181 ] Naganarasimha G R commented on YARN-4169: - Oops dint see this message, triggered by new patch ! > jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels > - > > Key: YARN-4169 > URL: https://issues.apache.org/jira/browse/YARN-4169 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4162.v2.005.patch, YARN-4169.v1.001.patch, > YARN-4169.v1.002.patch, YARN-4169.v1.003.patch, YARN-4169.v1.004.patch > > > Test failing in [[Jenkins build > 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/] > {code} > java.lang.NullPointerException: null > at java.util.HashSet.(HashSet.java:118) > at > org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972200#comment-14972200 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #531 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/531/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972233#comment-14972233 ] Wangda Tan commented on YARN-4287: -- [~nroberts], Thanks for updating, some thinkings regarding to your comments: bq. This is a behavior change, but I can't think of any good cases where someone would prefer the old behavior to the new. Let me know if you can think of some. Agree with you, most of your changes are good, I prefer to enable it to get better performance. But I can still think some edge cases, and I'd prefer to keep old one to avoid some magic things happen :). Let me explain more: There're several behavior changes in your patch, 1. rack-delay = min (computed-offswitch-delay, configured-rack-delay) When large configured-rack-delay specified, it uses old behavior. So this is safe to me. And I think what you mentioned before: bq. I didn't separate them in this version of the patch because I still want to be able to specify rack-locality-delay BUT have the computed delay take effect when an application is not asking for locality OR is really small. Makes sense to me, I just feel current way to compute offswitch delay need to be improved, I will add an example below. 2. node-delay = min(rack-delay, node-delay). If a cluster has 40 nodes, user requests 3 containers on node1: {code} Assume the configured-rack-delay=50, rack-delay = min(3 (#requested-container) * 1 (#requested-resource-name) / 40, 50) = 0. So: node-delay = min(rack-delay, 40) = 0 {code} In above example, no matter how rack-delay specified/computed, if we can keep the node-delay to 40, we have better chance to get node-local containers allocated. 3. Don't restore missed-opportunity if rack-local container allocated. The benefit of this change is obvious - we can get faster rack-local container allocation. But I feel this can also affect node-local container allocation (If the application asks only a small subset of nodes in a rack), may lead to some performance regression for locality I/O sensitive applications. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-v2.patch, YARN-4287-v3.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972242#comment-14972242 ] Hudson commented on YARN-4041: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2467 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2467/]) YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev d3a34a4f388155f6a7ef040e244ce7be788cd28b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/CHANGES.txt > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.7.2 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)