[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067533#comment-16067533 ] Daniel Templeton commented on YARN-6743: Yep, [~lori.lob...@cloudera.com], this patch is fine. All's well. > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067531#comment-16067531 ] Lori Loberg commented on YARN-6743: --- Does that mean YARN-6743 is still okay? I read through the last half of YARN-5006, but it's still a puzzle to me. Both are marked as Resolved. On Wed, Jun 28, 2017 at 5:20 PM, Daniel Templeton (JIRA)> yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6743: --- Affects Version/s: 2.9.0 > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6743: --- Target Version/s: 2.9.0, 3.0.0-beta1 (was: 3.0.0-beta1) > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067495#comment-16067495 ] Daniel Templeton commented on YARN-6743: Looks like the spaces are added in the branch-2 patch of YARN-5006, but the tabs issue still exists. I went ahead and pulled this into branch-2 to deal with the tabs. > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey
[ https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067471#comment-16067471 ] Ray Chiang commented on YARN-6150: -- I thought I had done this before, but the latest version of this patch doesn't seem to fix the unit test in branch-2. > TestContainerManagerSecurity tests for Yarn Server are flakey > - > > Key: YARN-6150 > URL: https://issues.apache.org/jira/browse/YARN-6150 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Daniel Sturman >Assignee: Daniel Sturman > Attachments: YARN-6150.001.patch, YARN-6150.002.patch, > YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, > YARN-6150.006.patch, YARN-6150.007.patch > > > Repeated runs of > {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either > pass or fail on repeated runs on the same codebase. Also, the two runs (one > in secure mode, one without security) aren't well labeled in JUnit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067451#comment-16067451 ] Sunil G commented on YARN-6678: --- I think we might need to debug more here as I am still getting test error. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API
[ https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067442#comment-16067442 ] Hudson commented on YARN-6280: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11947 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11947/]) YARN-6280. Introduce deselect query param to skip ResourceRequest from (sunilg: rev c1edca101c32a5999100bc6031784274d416b599) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DeSelectFields.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServiceProtocol.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > Introduce deselect query param to skip ResourceRequest from getApp/getApps > REST API > --- > > Key: YARN-6280 > URL: https://issues.apache.org/jira/browse/YARN-6280 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, restapi >Affects Versions: 2.7.3 >Reporter: Lantao Jin >Assignee: Lantao Jin > Attachments: YARN-6280.001.patch, YARN-6280.002.patch, > YARN-6280.003.patch, YARN-6280.004.patch, YARN-6280.005.patch, > YARN-6280.006.patch, YARN-6280.007.patch, YARN-6280.008.patch, > YARN-6280.009.patch, YARN-6280.010.patch, YARN-6280.011.patch > > > Begin from v2.7, the ResourceManager Cluster Applications REST API returns > ResourceRequest list. It's a very large construction in AppInfo. > As a test, we use below URI to query only 2 results: > http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2 > The results are very different: > ||Hadoop version|Total Character|Total Word|Total Lines|Size|| > |2.4.1|1192| 42| 42| 1.2 KB| > |2.7.1|1222179| 48740| 48735| 1.21 MB| > Most RESTful API requesters don't know about this after upgraded and their > old queries may cause ResourceManager more GC consuming and slower. Even if > they know this but have no idea to reduce the impact of ResourceManager > except slow down their query frequency. > The patch adding a query parameter "showResourceRequests" to help requesters > who don't need this information to reduce the overhead. In consideration of > compatibility of interface, the default value is true if they don't set the > parameter, so the behaviour is the same as now. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5311) Document graceful decommission CLI and usage
[ https://issues.apache.org/jira/browse/YARN-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067408#comment-16067408 ] Hudson commented on YARN-5311: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11946 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11946/]) YARN-5311. Document graceful decommission CLI and usage. Contributed by (junping_du: rev 4e3eebc943835077e3dd0df9e0b9239ae604cb89) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/GracefulDecommission.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > Document graceful decommission CLI and usage > > > Key: YARN-5311 > URL: https://issues.apache.org/jira/browse/YARN-5311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Affects Versions: 2.9.0 >Reporter: Junping Du >Assignee: Elek, Marton > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: YARN-5311.001.patch, YARN-5311.002.patch, > YARN-5311.003.patch, YARN-5311.004.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API
[ https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067406#comment-16067406 ] Sunil G commented on YARN-6280: --- Committed to trunk. [~cltlfcjin], could you please help to share branch-2 patch > Introduce deselect query param to skip ResourceRequest from getApp/getApps > REST API > --- > > Key: YARN-6280 > URL: https://issues.apache.org/jira/browse/YARN-6280 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, restapi >Affects Versions: 2.7.3 >Reporter: Lantao Jin >Assignee: Lantao Jin > Attachments: YARN-6280.001.patch, YARN-6280.002.patch, > YARN-6280.003.patch, YARN-6280.004.patch, YARN-6280.005.patch, > YARN-6280.006.patch, YARN-6280.007.patch, YARN-6280.008.patch, > YARN-6280.009.patch, YARN-6280.010.patch, YARN-6280.011.patch > > > Begin from v2.7, the ResourceManager Cluster Applications REST API returns > ResourceRequest list. It's a very large construction in AppInfo. > As a test, we use below URI to query only 2 results: > http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2 > The results are very different: > ||Hadoop version|Total Character|Total Word|Total Lines|Size|| > |2.4.1|1192| 42| 42| 1.2 KB| > |2.7.1|1222179| 48740| 48735| 1.21 MB| > Most RESTful API requesters don't know about this after upgraded and their > old queries may cause ResourceManager more GC consuming and slower. Even if > they know this but have no idea to reduce the impact of ResourceManager > except slow down their query frequency. > The patch adding a query parameter "showResourceRequests" to help requesters > who don't need this information to reduce the overhead. In consideration of > compatibility of interface, the default value is true if they don't set the > parameter, so the behaviour is the same as now. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API
[ https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-6280: -- Summary: Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API (was: Add a query parameter in ResourceManager Cluster Applications REST API to control whether or not returns ResourceRequest) > Introduce deselect query param to skip ResourceRequest from getApp/getApps > REST API > --- > > Key: YARN-6280 > URL: https://issues.apache.org/jira/browse/YARN-6280 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, restapi >Affects Versions: 2.7.3 >Reporter: Lantao Jin >Assignee: Lantao Jin > Attachments: YARN-6280.001.patch, YARN-6280.002.patch, > YARN-6280.003.patch, YARN-6280.004.patch, YARN-6280.005.patch, > YARN-6280.006.patch, YARN-6280.007.patch, YARN-6280.008.patch, > YARN-6280.009.patch, YARN-6280.010.patch, YARN-6280.011.patch > > > Begin from v2.7, the ResourceManager Cluster Applications REST API returns > ResourceRequest list. It's a very large construction in AppInfo. > As a test, we use below URI to query only 2 results: > http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2 > The results are very different: > ||Hadoop version|Total Character|Total Word|Total Lines|Size|| > |2.4.1|1192| 42| 42| 1.2 KB| > |2.7.1|1222179| 48740| 48735| 1.21 MB| > Most RESTful API requesters don't know about this after upgraded and their > old queries may cause ResourceManager more GC consuming and slower. Even if > they know this but have no idea to reduce the impact of ResourceManager > except slow down their query frequency. > The patch adding a query parameter "showResourceRequests" to help requesters > who don't need this information to reduce the overhead. In consideration of > compatibility of interface, the default value is true if they don't set the > parameter, so the behaviour is the same as now. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6746) SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code
[ https://issues.apache.org/jira/browse/YARN-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067373#comment-16067373 ] Sunil G commented on YARN-6746: --- Yes [~templedf]. Thanks for pointing out. We stopped using this after YARN-6040. We could remove this stale finction. > SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code > --- > > Key: YARN-6746 > URL: https://issues.apache.org/jira/browse/YARN-6746 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Daniel Templeton >Assignee: Deepti Sawhney >Priority: Minor > Labels: newbie > > The function is unused. It also appears to be broken. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6746) SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code
[ https://issues.apache.org/jira/browse/YARN-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned YARN-6746: -- Assignee: Deepti Sawhney (was: Daniel Templeton) > SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code > --- > > Key: YARN-6746 > URL: https://issues.apache.org/jira/browse/YARN-6746 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Daniel Templeton >Assignee: Deepti Sawhney >Priority: Minor > Labels: newbie > > The function is unused. It also appears to be broken. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration
[ https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067360#comment-16067360 ] Sunil G commented on YARN-4161: --- I got your point. I was trying to point out to an existing configuration in CS named *yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments*. I thought of keep this syntax and it seems more of similar nature to me. bq.We still need assignMultiple one, as maxAssign=-1 mostly means (assignMultiple is enabled, and the max allowed allocation is unlimited). Thanks for clarifying. I understood the reason behind that. On an another note, maxAssign=0 is meaning less. Correct? So value less than 0 could be considered for infinite. And a boolean variable to switch-on / off the feature. Few more doubts. # I think assignMultiple should not impact existing "maximum-offswitch-assignments" feature. I can see that current patch will skip when assignMultiple is configured as false. # Could we rename assignMultiple to something like assign-multiple-containers-per-heartbeat # Are you planning to consider this per queue as well ? Or only at CS level from top for now. > Capacity Scheduler : Assign single or multiple containers per heart beat > driven by configuration > > > Key: YARN-4161 > URL: https://issues.apache.org/jira/browse/YARN-4161 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Labels: oct16-medium > Attachments: YARN-4161.patch, YARN-4161.patch.1 > > > Capacity Scheduler right now schedules multiple containers per heart beat if > there are more resources available in the node. > This approach works fine however in some cases its not distribute the load > across the cluster hence throughput of the cluster suffers. I am adding > feature to drive that using configuration by that we can control the number > of containers assigned per heart beat. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6747) TestFSAppStarvation.testPreemptionEnable fails intermittently
Sunil G created YARN-6747: - Summary: TestFSAppStarvation.testPreemptionEnable fails intermittently Key: YARN-6747 URL: https://issues.apache.org/jira/browse/YARN-6747 Project: Hadoop YARN Issue Type: Bug Reporter: Sunil G *Error Message* Apps re-added even before starvation delay passed expected:<4> but was:<3> *Stacktrace* java.lang.AssertionError: Apps re-added even before starvation delay passed expected:<4> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation.testPreemptionEnabled(TestFSAppStarvation.java:117) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067322#comment-16067322 ] Hadoop QA commented on YARN-6678: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 59s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAsyncScheduling | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6678 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874072/YARN-6678.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 89756766d90e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e9d8bdf | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16267/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16267/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16267/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components:
[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration
[ https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067299#comment-16067299 ] Wei Yan commented on YARN-4161: --- Thanks for the comments [~sunilg]. For the configuration names, I followed the same names in FairScheduler. Renaming to different ones may make users little confused. We still need assignMultiple one, as maxAssign=-1 mostly means (assignMultiple is enabled, and the max allowed allocation is unlimited). > Capacity Scheduler : Assign single or multiple containers per heart beat > driven by configuration > > > Key: YARN-4161 > URL: https://issues.apache.org/jira/browse/YARN-4161 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Labels: oct16-medium > Attachments: YARN-4161.patch, YARN-4161.patch.1 > > > Capacity Scheduler right now schedules multiple containers per heart beat if > there are more resources available in the node. > This approach works fine however in some cases its not distribute the load > across the cluster hence throughput of the cluster suffers. I am adding > feature to drive that using configuration by that we can control the number > of containers assigned per heart beat. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6594) [API] Introduce SchedulingRequest object
[ https://issues.apache.org/jira/browse/YARN-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067273#comment-16067273 ] Jian He commented on YARN-6594: --- got a question, any reason we add one more level abstraction for {{ResourceSizing}} ? If there's no logic implication behind it and this only serves as a simple wrapper class, may be the existing flattened way is fine ? One less abstraction layer in the API is easier to interact for the caller. Btw, I'm going to use the APIs as soon as it's ready for the yarn native services. > [API] Introduce SchedulingRequest object > > > Key: YARN-6594 > URL: https://issues.apache.org/jira/browse/YARN-6594 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-6594.001.patch > > > This JIRA introduces a new SchedulingRequest object. > It will be part of the {{AllocateRequest}} and will be used to define sizing > (e.g., number of allocations, size of allocations) and placement constraints > for allocations. > Applications can use either this new object (when rich placement constraints > are required) or the existing {{ResourceRequest}} object. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration
[ https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067270#comment-16067270 ] Sunil G commented on YARN-4161: --- Thanks [~ywskycn] for rebasing the patch. Few high level comments # Could you please name patch to something like YARN-4161.002.patch etc by following the naming convention. It will be more easier. # I thoughts we can rename the configuration to "yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments". More thoughts are welcome # Once 2 is confirmed, rename all variables and getters/setter in align with config name # I prefer not to have a config like "yarn.scheduler.capacity.assignmultiple". If we can make the default of "yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments" to -1, we can assume the feature is off and we can continue allocate any numbers of containers as possible in a heartbeat. Once configured, we can assume that feature is on and restrict as per the limit imposed. > Capacity Scheduler : Assign single or multiple containers per heart beat > driven by configuration > > > Key: YARN-4161 > URL: https://issues.apache.org/jira/browse/YARN-4161 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Labels: oct16-medium > Attachments: YARN-4161.patch, YARN-4161.patch.1 > > > Capacity Scheduler right now schedules multiple containers per heart beat if > there are more resources available in the node. > This approach works fine however in some cases its not distribute the load > across the cluster hence throughput of the cluster suffers. I am adding > feature to drive that using configuration by that we can control the number > of containers assigned per heart beat. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067262#comment-16067262 ] Eric Badger commented on YARN-4589: --- It's on my list, but is a pretty low priority right now. I'm not sure what Chang's status is > Diagnostics for localization timeouts is lacking > > > Key: YARN-4589 > URL: https://issues.apache.org/jira/browse/YARN-4589 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4589.2.patch, YARN-4589.3.patch, YARN-4589.patch > > > When a container takes too long to localize it manifests as a timeout, and > there's no indication that localization was the issue. We need diagnostics > for timeouts to indicate the container was still localizing when the timeout > occurred. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067251#comment-16067251 ] Sunil G commented on YARN-6629: --- Thanks for the detailed analysis [~Tao Yang]. Few questions: # Is this happening in trunk as well ? # I understood why we return null when scheduerKey is not found. It looks fine to me as we can anyway go ahead and create container for AM. But resource requests for that container ll come any way null. It looks we handle that case fine. # Could you please help to write a test case to get this issue in branch-2 or in trunk. > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6629.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067231#comment-16067231 ] Hudson commented on YARN-6743: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11943 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11943/]) YARN-6743. yarn.resourcemanager.zk-max-znode-size.bytes description (templedf: rev 25d891a784304fcf02f57bc7984c31af45003553) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha4 > > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6746) SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code
Daniel Templeton created YARN-6746: -- Summary: SchedulerUtils.checkResourceRequestMatchingNodePartition() is dead code Key: YARN-6746 URL: https://issues.apache.org/jira/browse/YARN-6746 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 3.0.0-alpha3, 2.8.1 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Minor The function is unused. It also appears to be broken. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067197#comment-16067197 ] Yufei Gu commented on YARN-4589: [~lichangleo] and [~ebadger], does anyone of you plan to work on this? > Diagnostics for localization timeouts is lacking > > > Key: YARN-4589 > URL: https://issues.apache.org/jira/browse/YARN-4589 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4589.2.patch, YARN-4589.3.patch, YARN-4589.patch > > > When a container takes too long to localize it manifests as a timeout, and > there's no indication that localization was the issue. We need diagnostics > for timeouts to indicate the container was still localizing when the timeout > occurred. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey
[ https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067193#comment-16067193 ] Ray Chiang commented on YARN-6150: -- Unit test failure looks identical to YARN-5728. > TestContainerManagerSecurity tests for Yarn Server are flakey > - > > Key: YARN-6150 > URL: https://issues.apache.org/jira/browse/YARN-6150 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Daniel Sturman >Assignee: Daniel Sturman > Attachments: YARN-6150.001.patch, YARN-6150.002.patch, > YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, > YARN-6150.006.patch, YARN-6150.007.patch > > > Repeated runs of > {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either > pass or fail on repeated runs on the same codebase. Also, the two runs (one > in secure mode, one without security) aren't well labeled in JUnit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067186#comment-16067186 ] Sunil G commented on YARN-6714: --- Test case failures are unrelated. Also raised a ticket to handle FS test case failure. +1. I could commit this if there are no objection tomorrow. > RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED > event when async-scheduling enabled in CapacityScheduler > - > > Key: YARN-6714 > URL: https://issues.apache.org/jira/browse/YARN-6714 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6714.001.patch, YARN-6714.002.patch, > YARN-6714.003.patch > > > Currently in async-scheduling mode of CapacityScheduler, after AM failover > and unreserve all reserved containers, it still have chance to get and commit > the outdated reserve proposal of the failed app attempt. This problem > happened on an app in our cluster, when this app stopped, it unreserved all > reserved containers and compared these appAttemptId with current > appAttemptId, if not match it will throw IllegalStateException and make RM > crashed. > Error log: > {noformat} > 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_REMOVED to the scheduler > java.lang.IllegalStateException: Trying to unreserve for application > appattempt_1495188831758_0121_02 when currently reserved for application > application_1495188831758_0121 on node host: node1:45454 #containers=2 > available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822) > at java.lang.Thread.run(Thread.java:834) > {noformat} > When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and > CapacityScheduler#tryCommit both need to get write_lock before executing, so > we can check the app attempt state in commit process to avoid committing > outdated proposals. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067179#comment-16067179 ] Sunil G commented on YARN-6678: --- Kicked jenkins to see a clean a run again. > Committer thread crashes with IllegalStateException in async-scheduling mode > of CapacityScheduler > - > > Key: YARN-6678 > URL: https://issues.apache.org/jira/browse/YARN-6678 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Tao Yang >Assignee: Tao Yang > Attachments: YARN-6678.001.patch, YARN-6678.002.patch, > YARN-6678.003.patch > > > Error log: > {noformat} > java.lang.IllegalStateException: Trying to reserve container > container_e10_1495599791406_7129_01_001453 for application > appattempt_1495599791406_7129_01 when currently reserved container > container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 > #containers=40 available=... used=... > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546) > {noformat} > Reproduce this problem: > 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1 > 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and > allocated app-1/container-X2 > 3. nm1 reserved app-2/container-Y > 4. proposal-1 was accepted but throw IllegalStateException when applying > Currently the check code for reserve proposal in FiCaSchedulerApp#accept as > follows: > {code} > // Container reserved first time will be NEW, after the container > // accepted & confirmed, it will become RESERVED state > if (schedulerContainer.getRmContainer().getState() > == RMContainerState.RESERVED) { > // Set reReservation == true > reReservation = true; > } else { > // When reserve a resource (state == NEW is for new container, > // state == RUNNING is for increase container). > // Just check if the node is not already reserved by someone > if (schedulerContainer.getSchedulerNode().getReservedContainer() > != null) { > if (LOG.isDebugEnabled()) { > LOG.debug("Try to reserve a container, but the node is " > + "already reserved by another container=" > + schedulerContainer.getSchedulerNode() > .getReservedContainer().getContainerId()); > } > return false; > } > } > {code} > The reserved container on the node of reserve proposal will be checked only > for first-reserve container. > We should confirm that reserved container on this node is equal to re-reserve > container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey
[ https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067170#comment-16067170 ] Hadoop QA commented on YARN-6150: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 8s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: The patch generated 3 new + 27 unchanged - 11 fixed = 30 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 14s{color} | {color:red} hadoop-yarn-server-tests in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6150 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874927/YARN-6150.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 30687a497743 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f99b6d1 | | Default Java | 1.8.0_131 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16266/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16266/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16266/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests U:
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067155#comment-16067155 ] Daniel Templeton commented on YARN-6743: Latest patch LGTM. +1 > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067137#comment-16067137 ] Hadoop QA commented on YARN-6743: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6743 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874921/YARN-6743.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 6c5213816d83 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f99b6d1 | | Default Java | 1.8.0_131 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16265/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16265/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066988#comment-16066988 ] Jonathan Hung commented on YARN-6492: - OK, sounds good. Looking forward to the patch! > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-6492: --- Assignee: Manikandan R (was: Naganarasimha G R) > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066975#comment-16066975 ] Jason Lowe commented on YARN-6708: -- bq. Tried this approach earlier didnt work out since DiskChecker#checkDir doesnt handle based on FileContext. Well if we choose to always create the leaf directory then that should solve that particular problem, since we can create it with FileContex to get the proper permissions. ContainerLocalizer is calling checkDir without an expected permission, and that doesn't create the directory if it already exists. Although as I mentioned before, I don't think it's really important to check the leaf directory permissions so much as the intermediate ones, since that's what's important for fixing this JIRA. > Nodemanager container crash after ext3 folder limit > --- > > Key: YARN-6708 > URL: https://issues.apache.org/jira/browse/YARN-6708 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-6708.001.patch, YARN-6708.002.patch, > YARN-6708.003.patch, YARN-6708.004.patch > > > Configure umask as *027* for nodemanager service user > and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After > 4 *private* dir localization next directory will be *0/14* > Local Directory cache manager > {code} > vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l > total 28 > drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./ > drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../ > drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/ > {code} > *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750* > Nodemanager user will not be able check for localization path exists or not. > {{LocalResourcesTrackerImpl}} > {code} > case REQUEST: > if (rsrc != null && (!isResourcePresent(rsrc))) { > LOG.info("Resource " + rsrc.getLocalPath() > + " is missing, localizing it again"); > removeResource(req); > rsrc = null; > } > if (null == rsrc) { > rsrc = new LocalizedResource(req, dispatcher); > localrsrc.put(req, rsrc); > } > break; > {code} > *isResourcePresent* will always return false and same resource will be > localized to {{0}} to next unique number -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey
[ https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-6150: - Attachment: YARN-6150.007.patch Adding Akira's suggested change. > TestContainerManagerSecurity tests for Yarn Server are flakey > - > > Key: YARN-6150 > URL: https://issues.apache.org/jira/browse/YARN-6150 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Daniel Sturman >Assignee: Daniel Sturman > Attachments: YARN-6150.001.patch, YARN-6150.002.patch, > YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, > YARN-6150.006.patch, YARN-6150.007.patch > > > Repeated runs of > {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either > pass or fail on repeated runs on the same codebase. Also, the two runs (one > in secure mode, one without security) aren't well labeled in JUnit. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lori Loberg updated YARN-6743: -- Attachment: YARN-6743.002.patch replaced tabs with spaces > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Attachments: YARN-6743.001.patch, YARN-6743.002.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066922#comment-16066922 ] Hadoop QA commented on YARN-6743: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 54s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6743 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874912/YARN-6743.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 293e08587d18 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ee243e5 | | Default Java | 1.8.0_131 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/16264/artifact/patchprocess/whitespace-tabs.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16264/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16264/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Attachments: YARN-6743.001.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066845#comment-16066845 ] Bibin A Chundatt commented on YARN-6708: {quote} As far as the umask is concerned, you should be able to override it by passing a FileContext to the ContainerLocalizer where we've explicitly called setUMask on it to the desired umask you need for the test. {quote} Tried this approach earlier didnt work out since {{DiskChecker#checkDir}} doesnt handle based on {{FileContext}}. We can't change either to {quote} checkDir(LocalFileSystem localFS, Path dir, FsPermission expected) {quote} Since {{DiskValidator.checkstatus}} dont have an interface with using {{FileContext}} > Nodemanager container crash after ext3 folder limit > --- > > Key: YARN-6708 > URL: https://issues.apache.org/jira/browse/YARN-6708 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-6708.001.patch, YARN-6708.002.patch, > YARN-6708.003.patch, YARN-6708.004.patch > > > Configure umask as *027* for nodemanager service user > and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After > 4 *private* dir localization next directory will be *0/14* > Local Directory cache manager > {code} > vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l > total 28 > drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./ > drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../ > drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/ > drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/ > {code} > *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750* > Nodemanager user will not be able check for localization path exists or not. > {{LocalResourcesTrackerImpl}} > {code} > case REQUEST: > if (rsrc != null && (!isResourcePresent(rsrc))) { > LOG.info("Resource " + rsrc.getLocalPath() > + " is missing, localizing it again"); > removeResource(req); > rsrc = null; > } > if (null == rsrc) { > rsrc = new LocalizedResource(req, dispatcher); > localrsrc.put(req, rsrc); > } > break; > {code} > *isResourcePresent* will always return false and same resource will be > localized to {{0}} to next unique number -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lori Loberg updated YARN-6743: -- Attachment: YARN-6743.001.patch > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > Attachments: YARN-6743.001.patch > > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit
[ https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066725#comment-16066725 ] Jason Lowe commented on YARN-6708: -- bq. If Node manager service and users are in different group should be able to check the availability of existing cache folders during download and recovery. Right, I was confusing this with HDFS group behavior where we'd setup the parent directory to ensure the group of child directories is always one the NM participates in. As you say, we need it to be more permissive (at least 0711) so the NM will always be able to stat the localized resource. 0755 should be fine, the parent directory has already locked the filecache down to 710 anyway so only the user and the NM's group can get in. bq. Already existing FSDownload code handles this case. What I meant by that comment is it's weird that we _sometimes_ create the destDirPath leaf directory before calling FSDownload. If the parent is the local cache root we do not and leave that for FSDownload to do, but if the parent isn't the root then we do create it. We should either always create it or never create it. For example, I think the code should be more like the following. Also note the simplified logic where the whole path object is pushed onto the stack which saves having to manipulate the path when we pop things off the stack: {code} Path parent = destDirPath.getParent(); Path cacheRoot = LocalCacheDirectoryManager.getCacheDirectoryRoot(parent); Stack dirs = new Stack(); while (parent != null && !parent.equals(cacheRoot)) { dirs.push(parent); parent = parent.getParent(); } // Create sub directories with same permission while (!dirs.isEmpty()) { createDir(lfs, dirs.pop(), USER_CACHE_FOLDER_PERMS); } {code} That way we always leave it to FSDownload to create the leaf directory. If we'd rather always create it before calling FSDownload then we simply push destDirPath on the stack before the loop. bq. FSDownload should have been in nodemanager since its tightly coupled to the directoy permission wrt to localization With respect to permissions on that directory, yes, although those directory permissions are a typical default and not necessarily specific to localization. It's currently used by code outside of the nodemanager and therefore a bit tricky to move without breaking things that may be using it. bq. FSDownload handles the final cache directory permissions. Even if 0/0/85 is created before download, in FSDownload for 85 the same could get reset rt?? FSDownload will reset an existing cache directory permissions to 0755 iff the umask is too restrictive to allow 0755 by default. Either way the directory is going to be 0755 after FSDownload is done with it, and that's all we need. bq. The directory permission is 755 and in jenkins the umask is 022 to validate directory rights for code change used reflection. Container localizer USERCACHE permission could be package private but the above point of FSDownload will set the rights to 0755 or we should be checking only 0/0?? We should only be checking 0/0. We already know the leaf directory will be 0755 because FSDownload does that for us. The unit test can go ahead and verify 0/0/85 is the right permissions, but it will be the case even before the patch. Still probably worth it in case somehow they're not correct. What's critical is to test the permissions of 0/ and 0/0/. As far as the umask is concerned, you should be able to override it by passing a FileContext to the ContainerLocalizer where we've explicitly called setUMask on it to the desired umask you need for the test. We should not modify constants in classes to execute this test, especially via reflection. That's very confusing and could break other tests in the same test suite that run afterwards since the classes have been scribbled upon. Please also address the checkstyle comments in the new patch. I'm guessing many of them will become moot as part of the unit test rewrite. > Nodemanager container crash after ext3 folder limit > --- > > Key: YARN-6708 > URL: https://issues.apache.org/jira/browse/YARN-6708 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-6708.001.patch, YARN-6708.002.patch, > YARN-6708.003.patch, YARN-6708.004.patch > > > Configure umask as *027* for nodemanager service user > and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After > 4 *private* dir localization next directory will be *0/14* > Local Directory cache manager > {code} > vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l > total 28 >
[jira] [Assigned] (YARN-6744) Recover component information on YARN native services AM restart
[ https://issues.apache.org/jira/browse/YARN-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi reassigned YARN-6744: Assignee: (was: Billie Rinaldi) > Recover component information on YARN native services AM restart > > > Key: YARN-6744 > URL: https://issues.apache.org/jira/browse/YARN-6744 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi > Fix For: yarn-native-services > > > The new RoleInstance#Container constructor does not populate all the > information needed for a RoleInstance. This is the constructor used when > recovering running containers in AppState#addRestartedContainer. We will have > to figure out a way to determine this information for a running container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object
[ https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1601#comment-1601 ] Konstantinos Karanasos commented on YARN-6593: -- Hi [~wangda], I will be traveling starting Monday and will get back to the code on July 17th. So there will be time for you to look at the patch -- just make sure you send me your comments by July 17th. I would also like to get feedback from [~chris.douglas] on the patch. > [API] Introduce Placement Constraint object > --- > > Key: YARN-6593 > URL: https://issues.apache.org/jira/browse/YARN-6593 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6593.001.patch, YARN-6593.002.patch, > YARN-6593.003.patch, YARN-6593.004.patch > > > This JIRA introduces an object for defining placement constraints. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6720) Support updating FPGA related constraint node label after FPGA device re-configuration
[ https://issues.apache.org/jira/browse/YARN-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066397#comment-16066397 ] Zhongyue Nah commented on YARN-6720: [~leftnoteasy] YARN-3409 Wouldn't be a blocker since this JIRA is a improvement of YARN-6507. We intend to move device metadata to global space so that the scheduler can make more efficient decisions in terms of IP reuse. I assume GPGPUs have similar issues and we wish to find a common solution across all resource types before we get this through. > Support updating FPGA related constraint node label after FPGA device > re-configuration > -- > > Key: YARN-6720 > URL: https://issues.apache.org/jira/browse/YARN-6720 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang > Attachments: > Storing-and-Updating-extra-FPGA-resource-attributes-in-hdfs_v1.pdf > > > In order to provide a global optimal scheduling for mutable FPGA resource, it > seems an easy and direct way to utilize constraint node labels(YARN-3409) > instead of extending the global scheduler(YARN-3926) to match both resource > count and attributes. > The rough idea is that the AM sets the constraint node label expression to > request containers on the nodes whose FPGA devices has the matching IP, and > then NM resource handler update the node constraint label if there's FPGA > device re-configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4266) Allow whitelisted users to disable user re-mapping/squashing when launching docker containers
[ https://issues.apache.org/jira/browse/YARN-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066392#comment-16066392 ] Shane Kumpf edited comment on YARN-4266 at 6/28/17 12:16 PM: - Thanks for taking the time to answer my questions, [~ebadger]. I'm very interested in testing out the patch when it is ready. {quote}Yea I'm really not a fan either. I would strongly prefer a better, cleaner solution to this problem if there is one.{quote} The intent to YARN-5534 is provide a mount white list, so I believe that should help here. The initial patch here could hard code the bind mount while we test and provide feedback. Hopefully we can leverage YARN-5534 before this is wrapped up. {quote}I'm looking into this. I'm hoping that we can get around this so that we can optionally add the bind mount, but not require it for the --user option. I have not yet tested other AMs.{quote} I don't think this is a requirement for the initial version. We could do a follow on effort to remove/reduce the need for the bind mounted socket for a known list of AMs, assuming the behavior can be changed in those AMs. was (Author: shaneku...@gmail.com): Thanks for taking the time to answer my questions, [~ebadger]. I'm very interested in testing out the patch when it is ready. {quote}Yea I'm really not a fan either. I would strongly prefer a better, cleaner solution to this problem if there is one.{quote} The intent to YARN-5534 is provide a mount white list, so I believe that should help here. The initial patch could hard code the bind mount while we test and provide feedback. Hopefully we can leverage YARN-5534 before this is wrapped up. {quote}I'm looking into this. I'm hoping that we can get around this so that we can optionally add the bind mount, but not require it for the --user option. I have not yet tested other AMs.{quote} I don't think this is a requirement for the initial version. We could do a follow on effort to remove/reduce the need for the bind mounted socket for a known list of AMs, assuming the behavior can be changed in those AMs. > Allow whitelisted users to disable user re-mapping/squashing when launching > docker containers > - > > Key: YARN-4266 > URL: https://issues.apache.org/jira/browse/YARN-4266 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: luhuichun > Attachments: YARN-4266.001.patch, YARN-4266.001.patch, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v2.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v3.pdf, > YARN-4266-branch-2.8.001.patch > > > Docker provides a mechanism (the --user switch) that enables us to specify > the user the container processes should run as. We use this mechanism today > when launching docker containers . In non-secure mode, we run the docker > container based on > `yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user` and in > secure mode, as the submitting user. However, this mechanism breaks down with > a large number of 'pre-created' images which don't necessarily have the users > available within the image. Examples of such images include shared images > that need to be used by multiple users. We need a way in which we can allow a > pre-defined set of users to run containers based on existing images, without > using the --user switch. There are some implications of disabling this user > squashing that we'll need to work through : log aggregation, artifact > deletion etc., -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4266) Allow whitelisted users to disable user re-mapping/squashing when launching docker containers
[ https://issues.apache.org/jira/browse/YARN-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066392#comment-16066392 ] Shane Kumpf edited comment on YARN-4266 at 6/28/17 12:16 PM: - Thanks for taking the time to answer my questions, [~ebadger]. I'm very interested in testing out the patch when it is ready. {quote}Yea I'm really not a fan either. I would strongly prefer a better, cleaner solution to this problem if there is one.{quote} The intent to YARN-5534 is provide a mount white list, so I believe that should help here. The initial patch could hard code the bind mount while we test and provide feedback. Hopefully we can leverage YARN-5534 before this is wrapped up. {quote}I'm looking into this. I'm hoping that we can get around this so that we can optionally add the bind mount, but not require it for the --user option. I have not yet tested other AMs.{quote} I don't think this is a requirement for the initial version. We could do a follow on effort to remove/reduce the need for the bind mounted socket for a known list of AMs, assuming the behavior can be changed in those AMs. was (Author: shaneku...@gmail.com): Thanks for taking the time to answer my questions, [~ebadger]. I'm very interested in testing out the patch when it is ready. {quote}Yea I'm really not a fan either. I would strongly prefer a better, cleaner solution to this problem if there is one.{quote} The intent to YARN-5534 is provide a mount white list, so I believe that should help here. The initial patch could hard code the bind mount while we test and provide feedback. Hopefully we can leverage YARN-5534 before this is wrapped up. {quote}I'm looking into this. I'm hoping that we can get around this so that we can optionally add the bind mount, but not require it for the --user option. I have not yet tested other AMs.{quote} I don't think this is a requirement for the initial version. We could do a a follow on effort to remove/reduce the need for the bind mounted socket for a known list of AMs, assuming the behavior can be changed in those AMs. > Allow whitelisted users to disable user re-mapping/squashing when launching > docker containers > - > > Key: YARN-4266 > URL: https://issues.apache.org/jira/browse/YARN-4266 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: luhuichun > Attachments: YARN-4266.001.patch, YARN-4266.001.patch, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v2.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v3.pdf, > YARN-4266-branch-2.8.001.patch > > > Docker provides a mechanism (the --user switch) that enables us to specify > the user the container processes should run as. We use this mechanism today > when launching docker containers . In non-secure mode, we run the docker > container based on > `yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user` and in > secure mode, as the submitting user. However, this mechanism breaks down with > a large number of 'pre-created' images which don't necessarily have the users > available within the image. Examples of such images include shared images > that need to be used by multiple users. We need a way in which we can allow a > pre-defined set of users to run containers based on existing images, without > using the --user switch. There are some implications of disabling this user > squashing that we'll need to work through : log aggregation, artifact > deletion etc., -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4266) Allow whitelisted users to disable user re-mapping/squashing when launching docker containers
[ https://issues.apache.org/jira/browse/YARN-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066392#comment-16066392 ] Shane Kumpf commented on YARN-4266: --- Thanks for taking the time to answer my questions, [~ebadger]. I'm very interested in testing out the patch when it is ready. {quote}Yea I'm really not a fan either. I would strongly prefer a better, cleaner solution to this problem if there is one.{quote} The intent to YARN-5534 is provide a mount white list, so I believe that should help here. The initial patch could hard code the bind mount while we test and provide feedback. Hopefully we can leverage YARN-5534 before this is wrapped up. {quote}I'm looking into this. I'm hoping that we can get around this so that we can optionally add the bind mount, but not require it for the --user option. I have not yet tested other AMs.{quote} I don't think this is a requirement for the initial version. We could do a a follow on effort to remove/reduce the need for the bind mounted socket for a known list of AMs, assuming the behavior can be changed in those AMs. > Allow whitelisted users to disable user re-mapping/squashing when launching > docker containers > - > > Key: YARN-4266 > URL: https://issues.apache.org/jira/browse/YARN-4266 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: luhuichun > Attachments: YARN-4266.001.patch, YARN-4266.001.patch, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v2.pdf, > YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v3.pdf, > YARN-4266-branch-2.8.001.patch > > > Docker provides a mechanism (the --user switch) that enables us to specify > the user the container processes should run as. We use this mechanism today > when launching docker containers . In non-secure mode, we run the docker > container based on > `yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user` and in > secure mode, as the submitting user. However, this mechanism breaks down with > a large number of 'pre-created' images which don't necessarily have the users > available within the image. Examples of such images include shared images > that need to be used by multiple users. We need a way in which we can allow a > pre-defined set of users to run containers based on existing images, without > using the --user switch. There are some implications of disabling this user > squashing that we'll need to work through : log aggregation, artifact > deletion etc., -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6745) Cannot parse correct Spark 2.x jars classpath in YARN on Windows
Yun Tang created YARN-6745: -- Summary: Cannot parse correct Spark 2.x jars classpath in YARN on Windows Key: YARN-6745 URL: https://issues.apache.org/jira/browse/YARN-6745 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.7.2 Environment: Windows cluster, Yarn-2.7.2 Reporter: Yun Tang When submit Spark 2.x applications to YARN cluster on Windows, we found two errors: # If [dynamic resource allocation|https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation]is enabled for Spark, we will get exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.byteStringAs(Ljava/lang/String;Lorg/apache/spark/network/util/ByteUnit) # We cannot open spark application running web UI The two errors are both related to YARN cannot parse correct Spark 2.x jars wildcard classpath on Windows, and I checked the latest code from hadoop-3.x, this part of code seems not changed and would cause this error again. A typical appacahe folder to run spark executor/driver in our windows yarn looks like below: !http://wx1.sinaimg.cn/large/62eae5a9gy1fh14j38zvbj20bb0990tm.jpg! The link folder of ‘__spark_libs_’ points to a filecache folder with spark-2+ needed jars; The classpath-xxx.jar containing a manifest file of the runtime classpath to work around the 8k maximum command line length problem in windows (https://issues.apache.org/jira/browse/YARN-358) . The ‘launch_container.cmd’ is the script to start YARN container, please note that after running launch_container.cmd, the shortcut ‘__spark_conf_’ , ‘__spark_libs_’ and ‘__app__.jar’ could then be created. = The typical CLASSPATH of hadoop-2.7.2 in launch_container.cmd looks like below: !http://wx4.sinaimg.cn/large/62eae5a9gy1fh14j2c801j20sh023weh.jpg! The ‘classpath-3177336218981224920.jar’ contains a manifest file containing all the hadoop runtime jars, in which we could find spark-1.6.2-nao-yarn-shuffle.jar and servlet-api-2.5.jar. The two problems are all due to java runtime first load class from those two old jars, while spark 1.x shuffle external service is not compatible with spark 2.x and servlet-api-2.x is not compatible with servlet-api-3.x (used in spark-2). So, that is to say, the “xxx/spark_libs/*” should place before the classpath-jar. OK, let’s see what is the CLASSPATH in Linux. = The classpath in launch_container.sh looks like: !http://wx2.sinaimg.cn/large/62eae5a9gy1fh14ivycpxj20um01tjre.jpg! We can see the “xxx/spark_libs/*” placed before hadoop jars so that the #1 and #2 problem would not happen in Linux environment. *Root cause*: Two steps for the whole process 1.{color:blue}org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch{color} will transform original CLASSPATH into the classpath-jar in method ‘sanitizeEnv’. The CLASSPATH is: {code:java} %PWD%;%PWD%/__spark_conf__;%PWD%/__app__.jar;%PWD%/__spark_libs__/*;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*;%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*; {code} Within this method, it will call ‘createJarWithClassPath’ method from {color:blue}org.apache.hadoop.fs.FileUtil{color} 2. For the wildcard path, {color:blue}org.apache.hadoop.fs.FileUtil{color} will find the files in that folder with suffix of ‘jar’ or ‘JAR’. The previous %PWD%/__spark_libs__/* transformed to {code:java} D:/Data/Yarn/nm-local-dir/usercache/xxx/appcache/application_1494151518127_0073/container_e3752_1494151518127_0073_01_01/__spark_libs__/* . {code} However, this folder is not existing when generating the classpath-jar, only after running ‘launch_container.cmd’ we could have the ‘_spark_libs_’ folder in current directory, which results in YARN put the “xxx/_spark_libs_/*” classpath into unexpandedWildcardClasspath. And the unexpandedWildcardClasspath is placed after the classpath-jar in CLASSPATH, that’s why we see the “xxx/__spark_libs__/” located in the end. In other words, the correct order should be “xxx/spark_libs/*" placed before the classpath-jar just like Linux case or parse the “xxx/spark_libs/xxx.jar” into the classpath-jar, which means changing current wrong order satisfied the original design. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066312#comment-16066312 ] Hadoop QA commented on YARN-6714: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | | | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6714 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874225/YARN-6714.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 08a97adf2341 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ee243e5 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16263/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16263/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16263/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED > event when async-scheduling enabled in CapacityScheduler > - > > Key: YARN-6714 > URL: https://issues.apache.org/jira/browse/YARN-6714 > Project: Hadoop YARN >
[jira] [Commented] (YARN-6743) yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066184#comment-16066184 ] Naganarasimha G R commented on YARN-6743: - [~l...@cloudera.com], Its a simple fix, would you mind uploading the patch at the earliest ? > yarn.resourcemanager.zk-max-znode-size.bytes description needs spaces in > yarn-default.xml > - > > Key: YARN-6743 > URL: https://issues.apache.org/jira/browse/YARN-6743 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Lori Loberg >Priority: Trivial > Labels: newbie > > The description says: > {noformat} > Specifies the maximum size of the data that can be stored > in a znode.Value should be same or less than jute.maxbuffer > configured > in zookeeper.Default value configured is 1MB. > {noformat} > There should be spaces before "Value" and "Default". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6467) CSQueueMetrics needs to update the current metrics for default partition only
[ https://issues.apache.org/jira/browse/YARN-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066063#comment-16066063 ] Hadoop QA commented on YARN-6467: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 10 new + 218 unchanged - 14 fixed = 228 total (was 232) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.TestAppSchedulingInfo | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.TestAppSchedulingInfo | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:d946387 | | JIRA Issue | YARN-6467 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874805/YARN-6467-branch-2.8.011.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d0f02bb49b74 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20
[jira] [Commented] (YARN-6742) Minor mistakes in "The YARN Service Registry" docs
[ https://issues.apache.org/jira/browse/YARN-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066034#comment-16066034 ] Yeliang Cang commented on YARN-6742: [~ste...@apache.org], what is your opinion? Thank you! > Minor mistakes in "The YARN Service Registry" docs > -- > > Key: YARN-6742 > URL: https://issues.apache.org/jira/browse/YARN-6742 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha3 >Reporter: Yeliang Cang >Assignee: Yeliang Cang >Priority: Trivial > Attachments: YARN-6742-001.patch > > > There are minor mistakes in The YARN Service Registry docs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6738) LevelDBCacheTimelineStore should reuse ObjectMapper instances
[ https://issues.apache.org/jira/browse/YARN-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065963#comment-16065963 ] Zoltan Haindrich commented on YARN-6738: Thank you for adding me to the contributor list... and for reviewing and committing this change! > LevelDBCacheTimelineStore should reuse ObjectMapper instances > - > > Key: YARN-6738 > URL: https://issues.apache.org/jira/browse/YARN-6738 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: Screen Shot 2017-06-23 at 2.43.06 PM.png, > YARN-6738.1.patch, YARN-6738.2.patch > > > Using TezUI sometimes times out...and the cause of it was that the query was > quite large; and the leveldb handler seems like recreates the > {{ObjectMapper}} for every read - this is unfortunate; since the ObjectMapper > have to rescan the class annotations which may take some time. > Keeping the objectmapper reduces the ATS call time from 17 seconds to 3 > seconds for me...which was enough to get my tez-ui working again :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org