[jira] [Commented] (YARN-7454) RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup
[ https://issues.apache.org/jira/browse/YARN-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243535#comment-16243535 ] Bibin A Chundatt commented on YARN-7454: +1 LGTM > RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup > -- > > Key: YARN-7454 > URL: https://issues.apache.org/jira/browse/YARN-7454 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: YARN-7454.001.patch > > > RMAppAttemptMetrics#getAggregateResourceUsage does a double-lookup on a > concurrent hash map, but the app could be removed from the map between the > two lookups: > {code} > RMApp rmApp = rmContext.getRMApps().get(attemptId.getApplicationId()); > if (rmApp != null) { > RMAppAttempt currentAttempt = > rmContext.getRMApps().get(attemptId.getApplicationId()).getCurrentAppAttempt(); > {code} > The attempt should be looked up within rmApp directly rather than redundantly > trying to retrieve the RMApp first. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7454) RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup
[ https://issues.apache.org/jira/browse/YARN-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243535#comment-16243535 ] Bibin A Chundatt edited comment on YARN-7454 at 11/8/17 8:47 AM: - +1 LGTM. Testcase needs to be rechecked.Will trigger jenkins again was (Author: bibinchundatt): +1 LGTM > RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup > -- > > Key: YARN-7454 > URL: https://issues.apache.org/jira/browse/YARN-7454 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: YARN-7454.001.patch > > > RMAppAttemptMetrics#getAggregateResourceUsage does a double-lookup on a > concurrent hash map, but the app could be removed from the map between the > two lookups: > {code} > RMApp rmApp = rmContext.getRMApps().get(attemptId.getApplicationId()); > if (rmApp != null) { > RMAppAttempt currentAttempt = > rmContext.getRMApps().get(attemptId.getApplicationId()).getCurrentAppAttempt(); > {code} > The attempt should be looked up within rmApp directly rather than redundantly > trying to retrieve the RMApp first. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7119) yarn rmadmin -updateNodeResource should be updated for resource types
[ https://issues.apache.org/jira/browse/YARN-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-7119: --- Attachment: YARN-7119.004.patch > yarn rmadmin -updateNodeResource should be updated for resource types > - > > Key: YARN-7119 > URL: https://issues.apache.org/jira/browse/YARN-7119 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Manikandan R > Attachments: YARN-7119.001.patch, YARN-7119.002.patch, > YARN-7119.002.patch, YARN-7119.003.patch, YARN-7119.004.patch, > YARN-7119.004.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7159) Normalize unit of resource objects in RM and avoid to do unit conversion in critical path
[ https://issues.apache.org/jira/browse/YARN-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-7159: --- Attachment: YARN-7159.016.patch > Normalize unit of resource objects in RM and avoid to do unit conversion in > critical path > - > > Key: YARN-7159 > URL: https://issues.apache.org/jira/browse/YARN-7159 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Wangda Tan >Assignee: Manikandan R >Priority: Critical > Attachments: YARN-7159.001.patch, YARN-7159.002.patch, > YARN-7159.003.patch, YARN-7159.004.patch, YARN-7159.005.patch, > YARN-7159.006.patch, YARN-7159.007.patch, YARN-7159.008.patch, > YARN-7159.009.patch, YARN-7159.010.patch, YARN-7159.011.patch, > YARN-7159.012.patch, YARN-7159.013.patch, YARN-7159.015.patch, > YARN-7159.016.patch > > > Currently resource conversion could happen in critical code path when > different unit is specified by client. This could impact performance and > throughput of RM a lot. We should do unit normalization when resource passed > to RM and avoid expensive unit conversion every time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release
[ https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243363#comment-16243363 ] Ted Yu commented on YARN-7346: -- I am not sure a different folder helps. As long as mapreduce.tar.gz, containing un-relocated hbase jars, is on the classpath for (hbase) mapreduce jobs, we may see some problem. e.g. HBASE-19169 > Fix compilation errors against hbase2 alpha release > --- > > Key: YARN-7346 > URL: https://issues.apache.org/jira/browse/YARN-7346 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Vrushali C > > When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, > I got the following errors: > https://pastebin.com/Ms4jYEVB > This issue is to fix the compilation errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7463) Using getLocalPathForWrite for Container related debug information
Prabhu Joseph created YARN-7463: --- Summary: Using getLocalPathForWrite for Container related debug information Key: YARN-7463 URL: https://issues.apache.org/jira/browse/YARN-7463 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Priority: Minor Containers debug information launch_container.sh and directory.info are always logged into first directory of NM_LOG_DIRS instead of using the log directory returned from getLogPathForWrite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7464) Allow fiters on Nodes page
[ https://issues.apache.org/jira/browse/YARN-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasudevan Skm updated YARN-7464: Attachment: YARN-7464.001.patch Screen Shot 2017-11-08 at 4.56.04 PM.png Screen Shot 2017-11-08 at 4.56.12 PM.png [~sunil.gov...@gmail.com] > Allow fiters on Nodes page > -- > > Key: YARN-7464 > URL: https://issues.apache.org/jira/browse/YARN-7464 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: Screen Shot 2017-11-08 at 4.56.04 PM.png, Screen Shot > 2017-11-08 at 4.56.12 PM.png, YARN-7464.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value
Tao Yang created YARN-7461: -- Summary: DominantResourceCalculator#ratio calculation problem when right resource contains zero value Key: YARN-7461 URL: https://issues.apache.org/jira/browse/YARN-7461 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha4 Reporter: Tao Yang Priority: Minor Currently DominantResourceCalculator#ratio may return wrong result when right resource contains zero value. For example, there are three resource types such as, leftResource=<5, 5, 0> and rightResource=<10, 10, 0>, we expect the result of DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but currently is NaN. There should be a verification before divide calculation to ensure that dividend is not zero. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7453) RM fail to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-7453: -- Attachment: YARN-7453.001.patch Reverted YARN-6840's ResourceManager and ZKRMStateStore changes. This solves the issue for now. Detailed analysis will be shared a bit later > RM fail to switch to active after first successful start > > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7454) RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup
[ https://issues.apache.org/jira/browse/YARN-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243820#comment-16243820 ] Hadoop QA commented on YARN-7454: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 4 unchanged - 1 fixed = 4 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 53m 7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7454 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896453/YARN-7454.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5f6cae70a458 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e4c220e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18398/testReport/ | | Max. process+thread count | 810 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-7406) Moving logging APIs over to slf4j in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243700#comment-16243700 ] Akira Ajisaka commented on YARN-7406: - LGTM, +1 > Moving logging APIs over to slf4j in hadoop-yarn-api > > > Key: YARN-7406 > URL: https://issues.apache.org/jira/browse/YARN-7406 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yeliang Cang >Assignee: Yeliang Cang > Attachments: YARN-7406.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page
[ https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasudevan Skm updated YARN-7462: Attachment: YARN-7462.002.patch Fixes the typo in the previous patch [~sunil.gov...@gmail.com] > Render outstanding resource requests on application details page > > > Key: YARN-7462 > URL: https://issues.apache.org/jira/browse/YARN-7462 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, > YARN-7462.001.patch, YARN-7462.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7462) Render outstanding resource requests on application details page
[ https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243638#comment-16243638 ] Vasudevan Skm commented on YARN-7462: - [~sunil.gov...@gmail.com][~wangda] > Render outstanding resource requests on application details page > > > Key: YARN-7462 > URL: https://issues.apache.org/jira/browse/YARN-7462 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: YARN-7462.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page
[ https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasudevan Skm updated YARN-7462: Attachment: YARN-7462.001.patch > Render outstanding resource requests on application details page > > > Key: YARN-7462 > URL: https://issues.apache.org/jira/browse/YARN-7462 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: YARN-7462.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page
[ https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasudevan Skm updated YARN-7462: Attachment: Screen Shot 2017-11-08 at 3.24.30 PM.png > Render outstanding resource requests on application details page > > > Key: YARN-7462 > URL: https://issues.apache.org/jira/browse/YARN-7462 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, > YARN-7462.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value
[ https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-7461: --- Attachment: YARN-7461.001.patch > DominantResourceCalculator#ratio calculation problem when right resource > contains zero value > > > Key: YARN-7461 > URL: https://issues.apache.org/jira/browse/YARN-7461 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Tao Yang >Priority: Minor > Attachments: YARN-7461.001.patch > > > Currently DominantResourceCalculator#ratio may return wrong result when right > resource contains zero value. For example, there are three resource types > such as, leftResource=<5, 5, 0> and > rightResource=<10, 10, 0>, we expect the result of > DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but > currently is NaN. > There should be a verification before divide calculation to ensure that > dividend is not zero. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7462) Render outstanding resource requests on application details page
Vasudevan Skm created YARN-7462: --- Summary: Render outstanding resource requests on application details page Key: YARN-7462 URL: https://issues.apache.org/jira/browse/YARN-7462 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Vasudevan Skm Assignee: Vasudevan Skm -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page
[ https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasudevan Skm updated YARN-7462: Attachment: Screen Shot 2017-11-08 at 3.38.48 PM.png > Render outstanding resource requests on application details page > > > Key: YARN-7462 > URL: https://issues.apache.org/jira/browse/YARN-7462 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, Screen Shot > 2017-11-08 at 3.38.48 PM.png, YARN-7462.001.patch, YARN-7462.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7464) Allow fiters on Nodes page
Vasudevan Skm created YARN-7464: --- Summary: Allow fiters on Nodes page Key: YARN-7464 URL: https://issues.apache.org/jira/browse/YARN-7464 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Vasudevan Skm Assignee: Vasudevan Skm -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7464) Allow fiters on Nodes page
[ https://issues.apache.org/jira/browse/YARN-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243798#comment-16243798 ] Hadoop QA commented on YARN-7464: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 25m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7464 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896635/YARN-7464.001.patch | | Optional Tests | asflicense shadedclient | | uname | Linux eaa38cb8eb1c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e4c220e | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 402 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18400/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Allow fiters on Nodes page > -- > > Key: YARN-7464 > URL: https://issues.apache.org/jira/browse/YARN-7464 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Vasudevan Skm >Assignee: Vasudevan Skm > Attachments: Screen Shot 2017-11-08 at 4.56.04 PM.png, Screen Shot > 2017-11-08 at 4.56.12 PM.png, YARN-7464.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243982#comment-16243982 ] Shane Kumpf commented on YARN-7430: --- {quote} This is not true, see the following examples: {quote} I guess I don't understand what those examples are trying to convey. I become root with or without privileged if I don't supply the --user/uid flags. The centos image has no USER entry, so this is what I would expect. {code} [foo@localhost ~]$ docker run -it centos:latest bash [root@00f0c3ac84cf /]# id uid=0(root) gid=0(root) groups=0(root) [foo@localhost ~]$ docker run -it --privileged centos:latest bash [root@955eb326cb66 /]# id uid=0(root) gid=0(root) groups=0(root) {code} With user remapping disabled, which is the default, the {{docker run}} form is different than what you are testing. It is {{docker run --detach=true --user= ...}} (not --user=) - that doesn't seem to suffer from the issue you call out where the primary group is missing, since the container fails to start if the user doesn't exist in the container. {code:java} [foo@localhost ~]$ docker run -it --user=foo centos:latest bash docker: Error response from daemon: linux spec user: unable to find user foo: no matching entries in passwd file {code} At this point, I'm confused on exactly what conditions result in this exploit. Can you clarify? I've yet to see the form you tested occur anywhere. I see the following: * Without user remapping: docker run --user='skumpf' ... * With user remapping: docker run --user='501:502' --group-add='502' ... {quote} When --privileged=true and --user are set, the container is started with root privilegs and drop to the user privileges. If there is sticky bits binary in the container file system, it is possible for process to resume root privileges. If the container filesystem can be tainted by pushing custom image with sticky bits, then jailbreak is possible. {quote} I don't understand how that is exploitable. The ENTRYPOINT/CMD will be run as the user supplied by YARN. If the ENTRYPOINT/CMD is a setuid binary that gives that user root access in the container, this becomes true, but I can do that without a privileged container. {quote} Docker does not make any change to the file permission. {quote} That's my point. Consider the following Dockerfile: {code} FROM centos USERADD foo USER foo COPY run.sh / CMD /run.sh {code} I then submit an application as user "skumpf" that uses the image above. The localized resources and container launch script is owned by "skumpf" on the host and will be bind mounted into the container. With the current behavior using {{docker run}} and {{--user}} the the launch script will be ran as "skumpf" (per our docs skumpf must exist in the container and have the same UID as the host), even in the privileged case. If we remove {{--user}} from {{docker run}} in the the privileged case, then now the the launch script will be executed by user "foo" in my container, using whatever UID "foo" has in the container. User "foo" in the container does not have permission to execute the launch script owned by "skumpf" and thus the container will fail to launch with a permission denied error. We need the {{--user/uid}} option even if privileged is requested, because without it, we have no idea what user the container will run as. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7453) Fix issue where RM fails to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-7453: - Assignee: Rohith Sharma K S (was: Arun Suresh) > Fix issue where RM fails to switch to active after first successful start > - > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7453) Fix issue where RM fails to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244251#comment-16244251 ] Hudson commented on YARN-7453: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13203 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13203/]) YARN-7453. Fix issue where RM fails to switch to active after first (arun suresh: rev a9c70b0e84dab0c41e480a0dc0cb1a22efdc64ee) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ZKConfigurationStore.java > Fix issue where RM fails to switch to active after first successful start > - > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at >
[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244226#comment-16244226 ] Jason Lowe commented on YARN-7458: -- Thanks for the patch! If the container never completes then the method just moves on as if it did. Shouldn't it throw? Assuming it should, GenericTestUtils.waitFor seems appropriate here. Nit: I'm never a fan of 1 second sleeps in tests (or sleeps at all if we can avoid it). It's almost always overkill and makes the test slower than it needs to be. If a test had to wait for 10 containers to complete serially that's 10 seconds of wasted test time. I'd change this to at most 100msec, probably just 10msec. > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-7458.001.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7343) Add a junit test for ContainerScheduler recovery
[ https://issues.apache.org/jira/browse/YARN-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7343: -- Fix Version/s: (was: 2.9.0) > Add a junit test for ContainerScheduler recovery > > > Key: YARN-7343 > URL: https://issues.apache.org/jira/browse/YARN-7343 > Project: Hadoop YARN > Issue Type: Task >Reporter: kartheek muthyala >Assignee: Sampada Dehankar >Priority: Minor > Fix For: 3.1.0 > > Attachments: YARN-7343.001.patch, YARN-7343.002.patch, > YARN-7343.003.patch > > > With queuing at NM, Container recovery becomes interesting. Add a junit test > for recovering containers in different states. This should test the recovery > with the ContainerScheduler class that was introduced for enabling container > queuing on contention of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243986#comment-16243986 ] Hadoop QA commented on YARN-7453: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 37 unchanged - 0 fixed = 38 total (was 37) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}114m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands | | | org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896637/YARN-7453.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux aa45fb2f4809 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release
[ https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244182#comment-16244182 ] Haibo Chen commented on YARN-7346: -- Please help me understand this. The mapreduce.tar.gz is shipped for every hbase mapreduce job as a resource that will be localized by YARN for every container, right? If so, mapreduce.tar.gz should ideally contain just mapreduce client modules and their dependency modules, and yarn-node-manager is not one of them. Is the dependency of hbase mapreduce job on node-manager jars necessary? > Fix compilation errors against hbase2 alpha release > --- > > Key: YARN-7346 > URL: https://issues.apache.org/jira/browse/YARN-7346 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Vrushali C > > When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, > I got the following errors: > https://pastebin.com/Ms4jYEVB > This issue is to fix the compilation errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7453) Fix issue where RM fails to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7453: -- Summary: Fix issue where RM fails to switch to active after first successful start (was: RM fail to switch to active after first successful start) > Fix issue where RM fails to switch to active after first successful start > - > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7453) Fix issue where RM fails to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-7453: - Assignee: Arun Suresh > Fix issue where RM fails to switch to active after first successful start > - > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Arun Suresh >Priority: Blocker > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244191#comment-16244191 ] Jason Lowe commented on YARN-3091: -- bq. I'm looking at reverting the read/write lock changes within the fair scheduler at least. Thoughts? +1, we've also seen a number of problems around the scheduler's read/write locks and have done some short-term fixes to work around them like YARN-6680. They are significantly more expensive to acquire than a standard mutex if nobody is holding the lock, and there are lots of places where the scheduler needs to acquire them during a scheduling pass. > [Umbrella] Improve and fix locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Task > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244205#comment-16244205 ] Arun Suresh commented on YARN-7453: --- +1 for the patch. I ran the failed and timeout tests locally - it works for me. They just seem to be flaky Committing this shortly (will take care of checkstyle when I commit) > RM fail to switch to active after first successful start > > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7343) Add a junit test for ContainerScheduler recovery
[ https://issues.apache.org/jira/browse/YARN-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-7343: -- Fix Version/s: 3.1.0 2.9.0 > Add a junit test for ContainerScheduler recovery > > > Key: YARN-7343 > URL: https://issues.apache.org/jira/browse/YARN-7343 > Project: Hadoop YARN > Issue Type: Task >Reporter: kartheek muthyala >Assignee: Sampada Dehankar >Priority: Minor > Fix For: 2.9.0, 3.1.0 > > Attachments: YARN-7343.001.patch, YARN-7343.002.patch, > YARN-7343.003.patch > > > With queuing at NM, Container recovery becomes interesting. Add a junit test > for recovering containers in different states. This should test the recovery > with the ContainerScheduler class that was introduced for enabling container > queuing on contention of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6128) Add support for AMRMProxy HA
[ https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6128: --- Attachment: YARN-6128.v7.patch > Add support for AMRMProxy HA > > > Key: YARN-6128 > URL: https://issues.apache.org/jira/browse/YARN-6128 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, nodemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch, > YARN-6128.v1.patch, YARN-6128.v2.patch, YARN-6128.v3.patch, > YARN-6128.v3.patch, YARN-6128.v4.patch, YARN-6128.v5.patch, > YARN-6128.v6.patch, YARN-6128.v7.patch > > > YARN-556 added the ability for RM failover without loosing any running > applications. In a Federated YARN environment, there's additional state in > the {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we > need to enhance {{AMRMProxy}} to support HA. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244498#comment-16244498 ] Shane Kumpf commented on YARN-7430: --- I still believe there will be an issue if we do not specify --user. This causes problems for launching the container. Please try running distributed shell or similar using the Dockerfile I provided with --user removed, and you will see the behavior, the container will fail to launch. IIUC, {{\-\-privileged}} == {{\-\-user=root}} (or {{--user=0:0}}) in your view, correct? If so, doing that would satisfy the condition here if we set the user to root for privileged containers. I see some cases where that isn't necessary and I'm unsure how it might impact log aggregation, but I think it could work. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation
[ https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244426#comment-16244426 ] Daniel Templeton commented on YARN-7457: I have a long to-do list. :) We'll see who gets there first. > Delay scheduling should be an individual policy instead of part of scheduler > implementation > --- > > Key: YARN-7457 > URL: https://issues.apache.org/jira/browse/YARN-7457 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently, different schedulers have slightly different delay scheduling > implementations. Ideally we should make delay scheduling independent from > scheduler implementation. Benefits of doing this: > 1) Applications can choose which delay scheduling policy to use, it could be > time-based / missed-opportunistic-based or whatever new delay scheduling > policy supported by the cluster. Now it is global config of scheduler. > 2) Make scheduler implementations simpler and reusable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1623#comment-1623 ] Eric Yang edited comment on YARN-7430 at 11/8/17 6:07 PM: -- [~shaneku...@gmail.com] . Thank you for explaining your point of view. I understand how you arrived at these conclusions, but some use cases can not be satisfied by the current implementation. {quote} User "foo" in the container does not have permission to execute the launch script owned by "skumpf" and thus the container will fail to launch with a permission denied error. We need the -user/uid option even if privileged is requested, because without it, we have no idea what user the container will run as. {quote} What is the point of using privileged flag, if the process can only run as "skump" to run properly for privileged container? When container is granted with root power, root user should have ability to do anything, why drop that privilege away then reacquire it later using sticky bit? It is counter intuitive. Let review the ground rules that docker recommends, and what we are recommending to Hadoop users. # Docker security document clearly stated that docker must be run by trusted user only. This means user either have sudo privileges or they are part of docker group. # Privileged container allows ENTRYPOINT to spawn multi-user environment such as systemd or init like environment for multi-user support. # Hadoop YARN user can be a trusted user to spawn docker containers on behave of the end user. # Hadoop simulates doAs call through container-executor, therefore docker security recommendation stay intact. If container must run for end user who isn't part of privileged user nor docker group, then precaution must be taken to secure point of entry by yarn user or container-executor. # Docker does not know about external users and group on LDAP, hence use of {{\-\-user username}} is essentially limited to container's {{/etc/passwd}} and {{/etc/group}} to lookup group membership. Users/Group can be programmed into docker container build, however this solution can not be generalized for LDAP users in Hadoop eco-system. We don't want to end up rebuilding images, each time a new LDAP user is added. # Docker added {{\-\-user uid:gid}} and {{\-\-group-add}} to assign user credential and group membership without the user depends on /etc/passwd and /etc/group for lookup for dynamic users. In order to resolve the conflicting user management between docker and Hadoop. We must streamline the implementation to have capacity of supporting multi-users docker container (privileged container) as well as single LDAP user container (non-privileged container). Privileged container can only be spawned by trusted user for trusted user. Hence, the privileged container image can contain multiple users that is already pre-approved by system administrator. Privileged container can acquire additional resources using mount points, and consistent file system ACL inside and outside of container governs the overall security. There should never be a case where we allow localized resource for {{skump}} to work as {{foo}} user without properly secure file system ACL. At least we don't want to make this case work to ensure file system ACL rules are not broken. {{skump}} must do more work to secure localize resource with proper permission, if he has the power. Ultimately, file system permission is the last line of security defense that we have for storing files in HDFS via NFS mount point. >From this point of view, does it make more sense to run {{\-\-privileged}} >without {{\-\-user username}}? was (Author: eyang): [~shaneku...@gmail.com] . Thank you for explaining your point of view. I understand how you arrived at these conclusions, but some use cases can not be satisfied by the current implementation. {quote} User "foo" in the container does not have permission to execute the launch script owned by "skumpf" and thus the container will fail to launch with a permission denied error. We need the -user/uid option even if privileged is requested, because without it, we have no idea what user the container will run as. {quote} What is the point of using privileged flag, if the process can only run as "skump" to run properly for privileged container? When container is granted with root power, root user should have ability to do anything, why drop that privilege away then reacquire it later using sticky bit? It is counter intuitive. Let review the ground rules that docker recommends, and what we are recommending to Hadoop users. # Docker security document clearly stated that docker must be run by trusted user only. This means user either have sudo privileges or they are part of docker group. # Privileged container allows ENTRYPOINT to spawn multi-user environment such as systemd
[jira] [Commented] (YARN-7330) Add support to show GPU on UI/metrics
[ https://issues.apache.org/jira/browse/YARN-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244289#comment-16244289 ] Sunil G commented on YARN-7330: --- cc/ [~skmvasu] > Add support to show GPU on UI/metrics > - > > Key: YARN-7330 > URL: https://issues.apache.org/jira/browse/YARN-7330 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-7330.0-wip.patch, YARN-7330.003.patch, > YARN-7330.004.patch, YARN-7330.1-wip.patch, YARN-7330.2-wip.patch, > screencapture-0-wip.png > > > We should be able to view GPU metrics from UI/REST API. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value
[ https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244338#comment-16244338 ] Daniel Templeton commented on YARN-7461: Thanks for the patch. Couple of comments: # Missing a space before the '{' on DominantResourceCalculator:L393 # Instead of setting up the resource by hand in {{testRatioWithResourceValuesContainZero()}}, why not call {{setupExtraResource()}}? > DominantResourceCalculator#ratio calculation problem when right resource > contains zero value > > > Key: YARN-7461 > URL: https://issues.apache.org/jira/browse/YARN-7461 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-7461.001.patch > > > Currently DominantResourceCalculator#ratio may return wrong result when right > resource contains zero value. For example, there are three resource types > such as, leftResource=<5, 5, 0> and > rightResource=<10, 10, 0>, we expect the result of > DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but > currently is NaN. > There should be a verification before divide calculation to ensure that > dividend is not zero. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244381#comment-16244381 ] Haibo Chen commented on YARN-7388: -- Thanks [~rkanter] for the review! killContainer() is solely called in TestAMRestart to simulate AM container failures. In that sense, none of the available ContainerExitStatus matches that intension nicely. The closest to the method name is probably KILLED_BY_RESOURCEMANAGER if we ignore the real intension. Will change the status to that in FairScheduler and address the other comments. > TestAMRestart should be scheduler agnostic > -- > > Key: YARN-7388 > URL: https://issues.apache.org/jira/browse/YARN-7388 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7388.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1623#comment-1623 ] Eric Yang commented on YARN-7430: - [~shaneku...@gmail.com] . Thank you for explaining your point of view. I understand how you arrived at these conclusions, but some use cases can not be satisfied by the current implementation. {quote} User "foo" in the container does not have permission to execute the launch script owned by "skumpf" and thus the container will fail to launch with a permission denied error. We need the -user/uid option even if privileged is requested, because without it, we have no idea what user the container will run as. {quote} What is the point of using privileged flag, if the process can only run as "skump" to run properly for privileged container? When container is granted with root power, root user should have ability to do anything, why drop that privilege away then reacquire it later using sticky bit? It is counter intuitive. Let review the ground rules that docker recommends, and what we are recommending to Hadoop users. # Docker security document clearly stated that docker must be run by trusted user only. This means user either have sudo privileges or they are part of docker group. # Privileged container allows ENTRYPOINT to spawn multi-user environment such as systemd or init like environment for multi-user support. # Hadoop YARN user can be a trusted user to spawn docker containers on behave of the end user. # Hadoop simulates doAs call through container-executor, therefore docker security recommendation stay intact. If container must run for end user who isn't part of privileged user nor docker group, then precaution must be taken to secure point of entry by yarn user or container-executor. # Docker does not know about external users and group on LDAP, hence use of {{--user [username]}} is essentially limited to container's {{/etc/passwd}} and {{/etc/group}} to lookup group membership. Users/Group can be programmed into docker container build, however this solution can not be generalized for LDAP users in Hadoop eco-system. We don't want to end up rebuilding images, each time a new LDAP user is added. # Docker added {{--user uid:gid}} and {{--group-add}} to assign user credential and group membership without the user depends on /etc/passwd and /etc/group for lookup for dynamic users. In order to resolve the conflicting user management between docker and Hadoop. We must streamline the implementation to have capacity of supporting multi-users docker container (privileged container) as well as single LDAP user container (non-privileged container). Privileged container can only be spawned by trusted user for trusted user. Hence, the privileged container image can contain multiple users that is already pre-approved by system administrator. Privileged container can acquire additional resources using mount points, and consistent file system ACL inside and outside of container governs the overall security. There should never be a case where we allow localized resource for {{skump}} to work as {{foo}} user without properly secure file system ACL. At least we don't want to make this case work to ensure file system ACL rules are not broken. Ultimately, file system permission is the last line of security defense that we have for storing files in HDFS via NFS mount point. >From this point of view, does it make more sense to run {{--privileged}} >without {{--user username}}? > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation
[ https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244360#comment-16244360 ] Daniel Templeton commented on YARN-7457: I think it makes good sense to abstract that out as a service. It was actually on my todo list. > Delay scheduling should be an individual policy instead of part of scheduler > implementation > --- > > Key: YARN-7457 > URL: https://issues.apache.org/jira/browse/YARN-7457 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently, different schedulers have slightly different delay scheduling > implementations. Ideally we should make delay scheduling independent from > scheduler implementation. Benefits of doing this: > 1) Applications can choose which delay scheduling policy to use, it could be > time-based / missed-opportunistic-based or whatever new delay scheduling > policy supported by the cluster. Now it is global config of scheduler. > 2) Make scheduler implementations simpler and reusable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation
[ https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244474#comment-16244474 ] Wangda Tan commented on YARN-7457: -- Sounds good :) > Delay scheduling should be an individual policy instead of part of scheduler > implementation > --- > > Key: YARN-7457 > URL: https://issues.apache.org/jira/browse/YARN-7457 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently, different schedulers have slightly different delay scheduling > implementations. Ideally we should make delay scheduling independent from > scheduler implementation. Benefits of doing this: > 1) Applications can choose which delay scheduling policy to use, it could be > time-based / missed-opportunistic-based or whatever new delay scheduling > policy supported by the cluster. Now it is global config of scheduler. > 2) Make scheduler implementations simpler and reusable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244511#comment-16244511 ] Wangda Tan commented on YARN-3091: -- [~templedf]/[~jlowe], The RW locks are introduced to solve multiple threads to look at container allocation problem. From my tests report: https://issues.apache.org/jira/secure/attachment/12831662/YARN-5139-Concurrent-scheduling-performance-report.pdf, it can get about 2.5X throughput improvement when we have 3 threads looking at scheduler at the same time comparing to single thread. I agree that some previous locking changes (such as YARN-3139/YARN-3140/YARN-3141) can definitely be improved. But I think if we change everything to simple reentrant lock may affect throughput when we have multiple threads to do allocation. > [Umbrella] Improve and fix locks of RM scheduler > > > Key: YARN-3091 > URL: https://issues.apache.org/jira/browse/YARN-3091 > Project: Hadoop YARN > Issue Type: Task > Components: capacityscheduler, fairscheduler, resourcemanager, > scheduler >Reporter: Wangda Tan > > In existing YARN RM scheduler, there're some issues of using locks. For > example: > - Many unnecessary synchronized locks, we have seen several cases recently > that too frequent access of scheduler makes scheduler hang. Which could be > addressed by using read/write lock. Components include scheduler, CS queues, > apps > - Some fields not properly locked (Like clusterResource) > We can address them together in this ticket. > (More details see comments below) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value
[ https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned YARN-7461: -- Assignee: Tao Yang (was: Daniel Templeton) > DominantResourceCalculator#ratio calculation problem when right resource > contains zero value > > > Key: YARN-7461 > URL: https://issues.apache.org/jira/browse/YARN-7461 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-7461.001.patch > > > Currently DominantResourceCalculator#ratio may return wrong result when right > resource contains zero value. For example, there are three resource types > such as, leftResource=<5, 5, 0> and > rightResource=<10, 10, 0>, we expect the result of > DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but > currently is NaN. > There should be a verification before divide calculation to ensure that > dividend is not zero. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value
[ https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned YARN-7461: -- Assignee: Daniel Templeton > DominantResourceCalculator#ratio calculation problem when right resource > contains zero value > > > Key: YARN-7461 > URL: https://issues.apache.org/jira/browse/YARN-7461 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Tao Yang >Assignee: Daniel Templeton >Priority: Minor > Attachments: YARN-7461.001.patch > > > Currently DominantResourceCalculator#ratio may return wrong result when right > resource contains zero value. For example, there are three resource types > such as, leftResource=<5, 5, 0> and > rightResource=<10, 10, 0>, we expect the result of > DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but > currently is NaN. > There should be a verification before divide calculation to ensure that > dividend is not zero. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-7388: - Attachment: YARN-7388.01.patch > TestAMRestart should be scheduler agnostic > -- > > Key: YARN-7388 > URL: https://issues.apache.org/jira/browse/YARN-7388 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7388.00.patch, YARN-7388.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7453) RM fail to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-7453: Attachment: YARN-7453.001.patch Previous patch contains a little bit additional modifications, attached required only changes patch! > RM fail to switch to active after first successful start > > > Key: YARN-7453 > URL: https://issues.apache.org/jira/browse/YARN-7453 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-7453.001.patch, YARN-7453.001.patch > > > It is observed that RM fail to switch to ACTIVE after first successful start! > The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. > This continues in loop! > {noformat} > 2017-11-07 15:08:11,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to active state > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery > started > 2017-11-07 15:08:11,669 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded > RM state version info 1.5 > 2017-11-07 15:08:11,670 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403) > at > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start
[ https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243862#comment-16243862 ] Hadoop QA commented on YARN-7453: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 37 unchanged - 0 fixed = 58 total (was 37) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 56m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896632/YARN-7453.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f015a5df495e 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e4c220e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18399/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18399/testReport/ | | Max. process+thread count | 866 (vs.
[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container
[ https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244810#comment-16244810 ] Chandni Singh commented on YARN-7440: - When a ServiceRecord doesn't exist for multiple containers belonging to the same component, then it is possible that a container is assigned to a different component instance when the AM recovers. Discussed this issue with [~jianhe] and [~billie.rinaldi] offline. Swapping containers to different component instances will cause naming conflicts inside the container process. Currently we don't get the component name from the _Container_, so in order to implement this correctly we need to wait for https://issues.apache.org/jira/browse/YARN-6594. > Optimization to AM recovery when the service record doesn't exist for a > container > - > > Key: YARN-7440 > URL: https://issues.apache.org/jira/browse/YARN-7440 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh > Fix For: yarn-native-services > > Attachments: YARN-7440.001.patch, YARN-7440.002.patch > > > When AM recovers, if the service record doesn’t exist for a container sent > from RM, it can re-query the container status from NM, today it will release > the container -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244825#comment-16244825 ] Shane Kumpf edited comment on YARN-7430 at 11/8/17 10:14 PM: - {quote} User foo should not allow to execute script owned by skumpf, unless skumpf granted permission to run the script {quote} User foo doesn't execute the script owned by skumpf if we pass the user skumpf. This is exactly how every container works today. We pass the user name and run the entrypoint in the container as this user, overriding what the image has set. This allows localization and logging to work. With the change to turn this off, we let the image decide, but only for privileged containers. The result is that any image that has "USER " in it, must be modified. {quote} --user=0:0 does not mean privileged. It means the entry point is granted with pseudo root privileges inside the container. {quote} Sorry, poorly worded. Do you think that the entry point process in a privileged container should always run as root? if so, we should enforce that by setting {{\-\-user=0:0}}. I think there is a place for containers where we don't set the user, but for those types to work, we'd need to get rid of all mounts and avoid overriding the entrypoint ("vanilla containers"). was (Author: shaneku...@gmail.com): {quote} User foo should not allow to execute script owned by skumpf, unless skumpf granted permission to run the script {quote} User foo doesn't execute the script owned by skumpf if we pass the user skumpf. This is exactly how every container works today. We pass the user name and run the entrypoint in the container as this user, overriding what the image has set. This allows localization and logging to work. With the change to turn this off, we let the image decide, but only for privileged containers. The result is that any image that has "USER " in it, must be modified. {quote} --user=0:0 does not mean privileged. It means the entry point is granted with pseudo root privileges inside the container. {quote} Sorry, poorly worded. Do you think that the entry point process in a privileged container should always run as root? if so, we should enforce that by setting {{\-\-user=0:0}}. I think there is a place for applications where we don't set the user, but for those types to work, we'd need to get rid of all mounts and avoid overriding the entrypoint ("vanilla containers"). > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7399) Yarn services metadata storage improvement
[ https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7399: Attachment: YARN-7399.png See the attached diagram for the current implementation and proposed refinement. This will reduce duplicated code for storing metadata, and support multiple storage type. > Yarn services metadata storage improvement > -- > > Key: YARN-7399 > URL: https://issues.apache.org/jira/browse/YARN-7399 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang > Attachments: YARN-7399.png > > > In Slider, metadata is stored in user's home directory. Slider command line > interface interacts with HDFS directly to list deployed applications and > invoke YARN API or HDFS API to provide information to user. This design works > for a single user manage his/her own applications. When this design has been > ported to Yarn services, it becomes apparent that this design is difficult to > list all deployed applications on Hadoop cluster for administrator to manage > applications. Resource Manager needs to crawl through every user's home > directory to compile metadata about deployed applications. This can trigger > high load on namenode to list hundreds or thousands of list directory calls > owned by different users. Hence, it might be best to centralize the metadata > storage to Solr or HBase to reduce number of IO calls to namenode for manage > applications. > In Slider, one application is composed of metainfo, specifications in json, > and payload of zip file that contains application code and deployment code. > Both meta information, and zip file payload are stored in the same > application directory in HDFS. This works well for distributed applications > without central application manager that oversee all application. > In the next generation of application management, we like to centralize > metainfo and specifications in json to a centralized storage managed by YARN > user, and keep the payload zip file in user's home directory or in docker > registry. This arrangement can provide a faster lookup for metainfo when we > list all deployed applications and services on YARN dashboard. > When we centralize metainfo to YARN user, we also need to build ACL to > enforce who can manage applications, and make update. The current proposal is: > yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all > applications > normal users - submit/reconfigure/pause/kill his/her own applications -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244860#comment-16244860 ] Yufei Gu edited comment on YARN-7166 at 11/8/17 10:32 PM: -- Look good to me generally. How about marking {{allocatedMB}} and {{allocatedVCores}} deprecated? Need a space before {{Long}} in {{protected MapallocatedResources;}} was (Author: yufeigu): Look good to me generally. How about marking {{allocatedMB}} and {{allocatedVCores}} deprecated? > Container REST endpoints should report resource types > - > > Key: YARN-7166 > URL: https://issues.apache.org/jira/browse/YARN-7166 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7166.YARN-3926.001.patch, > YARN-7166.YARN-3926.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244787#comment-16244787 ] Eric Yang commented on YARN-7430: - [~shaneku...@gmail.com] . {quote} I still believe there will be an issue if we do not specify --user. This causes problems for launching the container. Please try running distributed shell or similar using the Dockerfile I provided with --user removed, and you will see the behavior, the container will fail to launch. {quote} Container fails for the right reason. User foo should not allow to execute script owned by skumpf, unless skumpf granted permission to run the script. {quote} IIUC, --privileged == --user=root (or --user=0:0) in your view, correct? If so, doing that would satisfy the condition here if we set the user to root for privileged containers. I see some cases where that isn't necessary and I'm unsure how it might impact log aggregation, but I think it could work. {quote} {{\-\-user=0:0}} does not mean privileged. It means the entry point is granted with pseudo root privileges inside the container. There is no guarantee that capability at host layer is granted. The {{\-\-privileged}} flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller. In other words, the container can then do almost everything that the host can do. This flag exists to allow special use-cases, like running Docker within Docker. {{\-\-Privileged}} is more destructive than pseudo root that should be handled carefully. System admin usually does not allow a user with sudo privileges to change resource utilization, hence I haven't seen a valid point to apply {{\-\-user}} flag on {{\-\-privileged}} containers. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244811#comment-16244811 ] Eric Badger commented on YARN-7430: --- I don't see how running the container as root will work with log aggregation. Everything written inside of the container will be written to bind-mounted volumes as root, not as the user that submitted the job. This means that root will own all of these things once the container finishes. So I'm not sure how we can write logs correctly while also allowing escalated privilege inside the container. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container
[ https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244822#comment-16244822 ] Billie Rinaldi commented on YARN-7440: -- Sounds good. Thanks, [~csingh]! > Optimization to AM recovery when the service record doesn't exist for a > container > - > > Key: YARN-7440 > URL: https://issues.apache.org/jira/browse/YARN-7440 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh > Fix For: yarn-native-services > > Attachments: YARN-7440.001.patch, YARN-7440.002.patch > > > When AM recovers, if the service record doesn’t exist for a container sent > from RM, it can re-query the container status from NM, today it will release > the container -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244825#comment-16244825 ] Shane Kumpf commented on YARN-7430: --- {quote} User foo should not allow to execute script owned by skumpf, unless skumpf granted permission to run the script {quote} User foo doesn't execute the script owned by skumpf if we pass the user skumpf. This is exactly how every container works today. We pass the user name and run the entrypoint in the container as this user, overriding what the image has set. This allows localization and logging to work. With the change to turn this off, we let the image decide, but only for privileged containers. The result is that any image that has "USER " in it, must be modified. {quote} --user=0:0 does not mean privileged. It means the entry point is granted with pseudo root privileges inside the container. {quote} Sorry, poorly worded. Do you think that the entry point process in a privileged container should always run as root? if so, we should enforce that by setting {{\-\-user=0:0}}. I think there is a place for applications where we don't set the user, but for those types to work, we'd need to get rid of all mounts and avoid overriding the entrypoint ("vanilla containers"). > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244832#comment-16244832 ] Hadoop QA commented on YARN-7388: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 254 unchanged - 4 fixed = 254 total (was 258) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands | | | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA | | | org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7388 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896690/YARN-7388.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 59da72dbc653 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cb35a59 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | |
[jira] [Assigned] (YARN-7399) Yarn services metadata storage improvement
[ https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-7399: --- Assignee: Eric Yang > Yarn services metadata storage improvement > -- > > Key: YARN-7399 > URL: https://issues.apache.org/jira/browse/YARN-7399 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang > > In Slider, metadata is stored in user's home directory. Slider command line > interface interacts with HDFS directly to list deployed applications and > invoke YARN API or HDFS API to provide information to user. This design works > for a single user manage his/her own applications. When this design has been > ported to Yarn services, it becomes apparent that this design is difficult to > list all deployed applications on Hadoop cluster for administrator to manage > applications. Resource Manager needs to crawl through every user's home > directory to compile metadata about deployed applications. This can trigger > high load on namenode to list hundreds or thousands of list directory calls > owned by different users. Hence, it might be best to centralize the metadata > storage to Solr or HBase to reduce number of IO calls to namenode for manage > applications. > In Slider, one application is composed of metainfo, specifications in json, > and payload of zip file that contains application code and deployment code. > Both meta information, and zip file payload are stored in the same > application directory in HDFS. This works well for distributed applications > without central application manager that oversee all application. > In the next generation of application management, we like to centralize > metainfo and specifications in json to a centralized storage managed by YARN > user, and keep the payload zip file in user's home directory or in docker > registry. This arrangement can provide a faster lookup for metainfo when we > list all deployed applications and services on YARN dashboard. > When we centralize metainfo to YARN user, we also need to build ACL to > enforce who can manage applications, and make update. The current proposal is: > yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all > applications > normal users - submit/reconfigure/pause/kill his/her own applications -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244860#comment-16244860 ] Yufei Gu commented on YARN-7166: Look good to me generally. How about marking {{allocatedMB}} and {{allocatedVCores}} deprecated? > Container REST endpoints should report resource types > - > > Key: YARN-7166 > URL: https://issues.apache.org/jira/browse/YARN-7166 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7166.YARN-3926.001.patch, > YARN-7166.YARN-3926.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6128) Add support for AMRMProxy HA
[ https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244668#comment-16244668 ] Hadoop QA commented on YARN-6128: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 31s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 313 unchanged - 0 fixed = 315 total (was 313) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}
[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container
[ https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244669#comment-16244669 ] Chandni Singh commented on YARN-7440: - Seems like the test is failing because the previous container for the service master during _recovery_ has same {{allocationRequestId}} as one of the component containers. Either the {{allocationRequestId}} for the service master container should be different or we can check during recovery that if the container number is 1, then we just release it. > Optimization to AM recovery when the service record doesn't exist for a > container > - > > Key: YARN-7440 > URL: https://issues.apache.org/jira/browse/YARN-7440 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh > Fix For: yarn-native-services > > Attachments: YARN-7440.001.patch, YARN-7440.002.patch > > > When AM recovers, if the service record doesn’t exist for a container sent > from RM, it can re-query the container status from NM, today it will release > the container -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root
Sean Mackrory created YARN-7465: --- Summary: start-yarn.sh fails to start ResourceManager unless running as root Key: YARN-7465 URL: https://issues.apache.org/jira/browse/YARN-7465 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 3.1.0 Reporter: Sean Mackrory Priority: Blocker This was found when testing rolling upgrades in HDFS-11096. It manifests as the following: {quote}Starting resourcemanagers on [ container-8.docker container-9.docker] /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line 298: --config: command not found{quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root
[ https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated YARN-7465: Attachment: YARN-7465.001.patch > start-yarn.sh fails to start ResourceManager unless running as root > --- > > Key: YARN-7465 > URL: https://issues.apache.org/jira/browse/YARN-7465 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Sean Mackrory >Priority: Blocker > Attachments: YARN-7465.001.patch > > > This was found when testing rolling upgrades in HDFS-11096. It manifests as > the following: > {quote}Starting resourcemanagers on [ container-8.docker container-9.docker] > /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line > 298: --config: command not found{quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244911#comment-16244911 ] Yufei Gu commented on YARN-7166: +1. Pending for Jenkins. > Container REST endpoints should report resource types > - > > Key: YARN-7166 > URL: https://issues.apache.org/jira/browse/YARN-7166 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7166.003.patch, YARN-7166.YARN-3926.001.patch, > YARN-7166.YARN-3926.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244910#comment-16244910 ] Eric Badger commented on YARN-7430: --- {quote} For users on LDAP, there is no good way to populate container with user and group information. {quote} Additionally, bind-mounting the /var/run/nscd will allow the container to use the host's ldap configuration to lookup users. That way, there won't be a cache miss everytime a new container is started up. We could setup each container to correctly use ldap, but that sounds like a waste because of all of the hits on the ldap server. That's why entering the container as a uid:gid pair will give you the username even if they don't exist in the image. Otherwise, the uid:gid pair won't have an associated username and the MRAppMaster will fail. This was discussed shortly in [YARN-4266|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16076756=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16076756] > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245019#comment-16245019 ] Hadoop QA commented on YARN-7143: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-7143 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7143 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896772/YARN-7143.003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18408/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > FileNotFound handling in ResourceUtils is inconsistent > -- > > Key: YARN-7143 > URL: https://issues.apache.org/jira/browse/YARN-7143 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7143.002.patch, YARN-7143.003.patch, > YARN-7143.YARN-3926.001.patch > > > When loading the resource-types.xml file, we warn and move on if it's not > found. When loading the node-resource.xml file, we abort loading resource > types if the file isn't found. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244892#comment-16244892 ] Daniel Templeton commented on YARN-7166: I don't think that's needed. CPU and memory are accessed frequently enough that they deserve dedicated variables and methods. Maybe later after resource types has settled in a bit more... > Container REST endpoints should report resource types > - > > Key: YARN-7166 > URL: https://issues.apache.org/jira/browse/YARN-7166 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7166.YARN-3926.001.patch, > YARN-7166.YARN-3926.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-7166: --- Attachment: YARN-7166.003.patch > Container REST endpoints should report resource types > - > > Key: YARN-7166 > URL: https://issues.apache.org/jira/browse/YARN-7166 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7166.003.patch, YARN-7166.YARN-3926.001.patch, > YARN-7166.YARN-3926.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types
[ https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244955#comment-16244955 ] Hadoop QA commented on YARN-7166: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7166 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896757/YARN-7166.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 38f42c154e16 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cb35a59 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/18405/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18405/testReport/ | | Max. process+thread count | 432 (vs. ulimit of 5000) | | modules | C:
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244970#comment-16244970 ] Eric Yang commented on YARN-7430: - [~ebadger] They are two separate problems. A lot of conversation here belongs to YARN-7446. This issue is to tackle the problem that we have a implicit privilege escalation security hole in the default shipped configuration when the following condition is met: # Privileged container is enabled. # Deploy docker container with user mapping to a different uid:gid than host OS, or using a numeric username to launch app. # Data output from container is written with as someone else or root group. In summary, to prevent privileges escalation, we should always pass in primary group to improve security. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7466) ResourceRequest has a different default for allocationRequestId than Container
Chandni Singh created YARN-7466: --- Summary: ResourceRequest has a different default for allocationRequestId than Container Key: YARN-7466 URL: https://issues.apache.org/jira/browse/YARN-7466 Project: Hadoop YARN Issue Type: Bug Reporter: Chandni Singh Assignee: Chandni Singh The default value of allocationRequestId is inconsistent. It is -1 in {{ContainerProto}} but 0 in {{ResourceRequestProto}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7437) Give SchedulingPlacementSet to a better name.
[ https://issues.apache.org/jira/browse/YARN-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245000#comment-16245000 ] Konstantinos Karanasos commented on YARN-7437: -- Thanks, [~leftnoteasy]! Looks good, will commit it to trunk shortly. > Give SchedulingPlacementSet to a better name. > - > > Key: YARN-7437 > URL: https://issues.apache.org/jira/browse/YARN-7437 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-7437.001.patch, YARN-7437.002.patch, > YARN-7437.003.patch, YARN-7437.004.patch > > > Currently, the SchedulingPlacementSet is very confusing. Here're its > responsibilities: > 1) Store ResourceRequests. (Or SchedulingRequest after YARN-6592). > 2) Decide order of nodes to allocate when there're multiple node candidates. > 3) Decide if we should reject node for given requests. > 4) Store any states/cache can help make decision for #2/#3 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-7143: --- Attachment: YARN-7143.003.patch Good point. > FileNotFound handling in ResourceUtils is inconsistent > -- > > Key: YARN-7143 > URL: https://issues.apache.org/jira/browse/YARN-7143 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7143.002.patch, YARN-7143.003.patch, > YARN-7143.YARN-3926.001.patch > > > When loading the resource-types.xml file, we warn and move on if it's not > found. When loading the node-resource.xml file, we abort loading resource > types if the file isn't found. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7399) Yarn services metadata storage improvement
[ https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244883#comment-16244883 ] Eric Yang commented on YARN-7399: - The purpose of metadata storage API is to provide a low latency, simple key/value lookup for Yarnfiles. We will call this api "application catalog" as a generic term to represent this function. The feature of application catalog are: 1. Register an application record for deployment. 2. Update configuration of existing application. 3. Decommission an application record. 4. Retrieve information about the application record. 5. Search application record by user, or application name. > Yarn services metadata storage improvement > -- > > Key: YARN-7399 > URL: https://issues.apache.org/jira/browse/YARN-7399 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang > Attachments: YARN-7399.png > > > In Slider, metadata is stored in user's home directory. Slider command line > interface interacts with HDFS directly to list deployed applications and > invoke YARN API or HDFS API to provide information to user. This design works > for a single user manage his/her own applications. When this design has been > ported to Yarn services, it becomes apparent that this design is difficult to > list all deployed applications on Hadoop cluster for administrator to manage > applications. Resource Manager needs to crawl through every user's home > directory to compile metadata about deployed applications. This can trigger > high load on namenode to list hundreds or thousands of list directory calls > owned by different users. Hence, it might be best to centralize the metadata > storage to Solr or HBase to reduce number of IO calls to namenode for manage > applications. > In Slider, one application is composed of metainfo, specifications in json, > and payload of zip file that contains application code and deployment code. > Both meta information, and zip file payload are stored in the same > application directory in HDFS. This works well for distributed applications > without central application manager that oversee all application. > In the next generation of application management, we like to centralize > metainfo and specifications in json to a centralized storage managed by YARN > user, and keep the payload zip file in user's home directory or in docker > registry. This arrangement can provide a faster lookup for metainfo when we > list all deployed applications and services on YARN dashboard. > When we centralize metainfo to YARN user, we also need to build ACL to > enforce who can manage applications, and make update. The current proposal is: > yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all > applications > normal users - submit/reconfigure/pause/kill his/her own applications -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root
[ https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244896#comment-16244896 ] Sean Mackrory commented on YARN-7465: - I suspect Yetus will complain that there are no tests - but this is a trivial typo introduced by a major rewrite of the script that is caught by the tests I'm trying to commit in HDFS-11096. > start-yarn.sh fails to start ResourceManager unless running as root > --- > > Key: YARN-7465 > URL: https://issues.apache.org/jira/browse/YARN-7465 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Sean Mackrory >Priority: Blocker > Attachments: YARN-7465.001.patch > > > This was found when testing rolling upgrades in HDFS-11096. It manifests as > the following: > {quote}Starting resourcemanagers on [ container-8.docker container-9.docker] > /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line > 298: --config: command not found{quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244901#comment-16244901 ] Daniel Templeton commented on YARN-7458: Couple more issues to fix while you're in there: # Probably safer to call {{ContainerState.COMPLETE.equals(...)}} on L416-417 # That catch on L421 is bad. It means that if we interrupt this test, it will ignore it and keep waiting. Probably better to put the catch outside the loop. > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-7458.001.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244948#comment-16244948 ] Eric Yang commented on YARN-7430: - [~ebadger] In a unix box, when a user run sudo commands, all logs are written to syslog or /var/log/messages. They are owned by root. There are Enterprise log aggregation tools that can search and filter out segment of syslog and /var/log/messages belong to certain user by using terminal id, and audit id. The log viewer identify user base on terminal id, and audit id to determine if user have rights to see the log. Hadoop doesn't have to be different from existing design. The information generated by root container should belong to root in the event user is revoked of sudo rights. He will not have access to the logs later. Docker console output is already appended to container log if we don't detach container, then all logs goes into container log. Therefore, we have logs that is compiled with application id and container id. We have information available to determine if the user is allowed to see the logs. What log aggregation are we doing in addition to capture the docker console output? If the application is writing to file system directly without tracking, there will be no accurate way to identify the origin of the log. However, this is not a special case. This problem exist today for any shared service user, and it is up to the developer to generate logs that either have user name/host name in the log filename to support log tracking. I am not clear on how removing {{\-\-user}} flag would result in log aggregation not working. Could you clarify? [~shaneku...@gmail.com] . If passing --user=0.0 with --privileged flag can keep log aggregation to work. I have no objection with this. Is there a design of how log aggregation works for Yarn Services which is different from classic yarn containers? > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244970#comment-16244970 ] Eric Yang edited comment on YARN-7430 at 11/9/17 12:13 AM: --- [~ebadger] They are two separate problems. A lot of conversation here belongs to YARN-7446. This issue is to tackle the problem that we have a implicit privilege escalation security hole in the default shipped configuration when the following condition is met: # Privileged container is enabled. # Deploy docker container with user mapping to a different uid:gid than host OS, or using a numeric username to launch app. # Data output from container is written as someone else or with root group ownership. In summary, to prevent privileges escalation, we should always pass in primary group to improve security. was (Author: eyang): [~ebadger] They are two separate problems. A lot of conversation here belongs to YARN-7446. This issue is to tackle the problem that we have a implicit privilege escalation security hole in the default shipped configuration when the following condition is met: # Privileged container is enabled. # Deploy docker container with user mapping to a different uid:gid than host OS, or using a numeric username to launch app. # Data output from container is written with as someone else or root group. In summary, to prevent privileges escalation, we should always pass in primary group to improve security. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245016#comment-16245016 ] Shane Kumpf commented on YARN-7430: --- IMO, I think this issue can be closed as invalid. Most of this does belong in YARN-7446 regarding the use of {{\-\-user}} and {{\-\-privileged}}, sorry for derailing the conversation. > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7386) Duplicate Strings in various places in Yarn memory
[ https://issues.apache.org/jira/browse/YARN-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245027#comment-16245027 ] Robert Kanter commented on YARN-7386: - The patch looks good to me. The Jenkins is too old and the details lost, so I've kicked off another run. > Duplicate Strings in various places in Yarn memory > -- > > Key: YARN-7386 > URL: https://issues.apache.org/jira/browse/YARN-7386 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: YARN-7386.01.patch, YARN-7386.02.patch > > > Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a > big cluster. The tool uncovered several sources of memory waste. One problem > is duplicate strings: > {code} > Total strings Unique strings Duplicate values > Overhead > 361,506 86,672 5,928 22,886K (7.6%) > {code} > They are spread across a number of locations. The biggest source of waste is > the following reference chain: > {code} > 7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays: > ↖{j.u.HashMap}.values > ↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment > ↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer > ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext > ↖{java.util.concurrent.ConcurrentHashMap}.values > ↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications > ↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext > ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext > ↖Java Local@3ed9ef820 > (org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor) > {code} > However, there are also many others. Mostly they are strings in proto buffer > or proto buffer builder objects. I plan to get rid of at least the worst > offenders by inserting String.intern() calls. String.intern() used to consume > memory in PermGen and was not very scalable up until about the early JDK 7 > versions, but has greatly improved since then, and I've used it many times > without any issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container
[ https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244915#comment-16244915 ] Eric Badger commented on YARN-7430: --- Also, this conversation seems to have morphed into a dup of YARN-7446. Are there 2 distinct issues here or should we close one as a dup of the other? > User and Group mapping are incorrect in docker container > > > Key: YARN-7430 > URL: https://issues.apache.org/jira/browse/YARN-7430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security, yarn >Affects Versions: 2.9.0, 3.0.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-7430.001.patch > > > In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to > enforce user and group for the running user. In YARN-6623, this translated > to --user=test --group-add=group1. The code no longer enforce group > correctly for launched process. > In addition, the implementation in YARN-6623 requires the user and group > information to exist in container to translate username and group to uid/gid. > For users on LDAP, there is no good way to populate container with user and > group information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244926#comment-16244926 ] Yufei Gu commented on YARN-7143: Looks good to me generally. Only one thing, the new {{initializedResources = true;}} isn't necessary since {{initializeResourcesMap()}} does that anyway. > FileNotFound handling in ResourceUtils is inconsistent > -- > > Key: YARN-7143 > URL: https://issues.apache.org/jira/browse/YARN-7143 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7143.002.patch, YARN-7143.YARN-3926.001.patch > > > When loading the resource-types.xml file, we warn and move on if it's not > found. When loading the node-resource.xml file, we abort loading resource > types if the file isn't found. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-7458: Attachment: YARN-7458.002.patch Thanks for the reviews. That all makes sense. Uploading 002 patch: - Replaces custom loop with {{GenericTestUitls#waitFor}} and lowered the check interval to 10msec. This also makes it fail the test if the loop expires before the container completes and fixes the interrupt issue. - Reversed the {{equals}} call - Improved the log message to also print out the current container state for easier debuggability > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-7458.001.patch, YARN-7458.002.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7466) ResourceRequest has a different default for allocationRequestId than Container
[ https://issues.apache.org/jira/browse/YARN-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244988#comment-16244988 ] Jian He commented on YARN-7466: --- [~leftnoteasy], [~subru], opinion on this ? we should make it consistent ? > ResourceRequest has a different default for allocationRequestId than Container > -- > > Key: YARN-7466 > URL: https://issues.apache.org/jira/browse/YARN-7466 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chandni Singh >Assignee: Chandni Singh > > The default value of allocationRequestId is inconsistent. > It is -1 in {{ContainerProto}} but 0 in {{ResourceRequestProto}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root
[ https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245012#comment-16245012 ] Hadoop QA commented on YARN-7465: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 13s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 50s{color} | {color:green} hadoop-yarn in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7465 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896758/YARN-7465.001.patch | | Optional Tests | asflicense mvnsite unit shellcheck shelldocs | | uname | Linux 7093dba60e35 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cb35a59 | | maven | version: Apache Maven 3.3.9 | | shellcheck | v0.4.6 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18406/testReport/ | | Max. process+thread count | 339 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn U: hadoop-yarn-project/hadoop-yarn | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18406/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > start-yarn.sh fails to start ResourceManager unless running as root > --- > > Key: YARN-7465 > URL: https://issues.apache.org/jira/browse/YARN-7465 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Sean Mackrory >Priority: Blocker > Attachments: YARN-7465.001.patch > > > This was found when testing rolling upgrades in HDFS-11096. It manifests as > the following: > {quote}Starting resourcemanagers on [ container-8.docker container-9.docker] > /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line > 298: --config: command not found{quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245021#comment-16245021 ] Daniel Templeton commented on YARN-7458: That's a lot of info level logging! Do we need that message printed every 10ms? > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-7458.001.patch, YARN-7458.002.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7455) add_mounts can overrun temporary buffer
[ https://issues.apache.org/jira/browse/YARN-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244579#comment-16244579 ] Eric Yang commented on YARN-7455: - There is a max size check in add_mounts to prevent buffer overflow. The current size can contain source and target path of 510 characters deep. Do we want to double it? Given that we don't add black list into the tmp_buffer, do we still need this? > add_mounts can overrun temporary buffer > --- > > Key: YARN-7455 > URL: https://issues.apache.org/jira/browse/YARN-7455 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0, 3.0.0 >Reporter: Jason Lowe > > While reviewing YARN-7197 I noticed that add_mounts in docker_util.c has a > potential buffer overflow since tmp_buffer is only 1024 bytes which may not > be sufficient to hold the specified mount path. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244605#comment-16244605 ] Hadoop QA commented on YARN-7388: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 254 unchanged - 4 fixed = 254 total (was 258) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 13s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}114m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7388 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896690/YARN-7388.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 97520fcb9a7b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cb35a59 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18402/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18402/testReport/ | | Max. process+thread count | 853 (vs. ulimit of 5000) | | modules | C:
[jira] [Commented] (YARN-7419) Implement Auto Queue Creation with modifications to queue mapping flow
[ https://issues.apache.org/jira/browse/YARN-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244601#comment-16244601 ] Wangda Tan commented on YARN-7419: -- Thanks [~suma.shivaprasad] for updating the patch, more comments: 1) CapacityScheduler: 1.1 Instead of fetching ApplicationPlacementContext from RMApp (to avoid perf/locking issue and code flow becomes more clear), you can add the ApplicationPlacementContext to {{AppAddedSchedulerEvent}}. And getPlacementContext API can be removed from RMApp. 1.2 Move following if ... to addApplication: {code} if (placementContext != null) { // ... } {code} Like, {code} if (queue == null && placementContext != null) { //Could be a potential auto-created leaf queue } {code} Only enter the autoCreateLeafQueue function when necessary. 1.3 Following two catch exception can be merged by using {{YarnException | IOException e}} syntax. {code} catch (YarnException e) { LOG.error("Could not auto-create leaf queue due to : ", e); final String message = "Application " + applicationId + " submission by user " + user + " to queue: " + queueName + " failed : " + e.getMessage(); this.rmContext.getDispatcher().getEventHandler().handle( new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, message)); } catch (IOException e) { final String message = "Application " + applicationId + " submission by user " + user + " to queue: " + queueName + " failed : " + e.getMessage(); LOG.error("Could not auto-create leaf queue due to : ", e); this.rmContext.getDispatcher().getEventHandler().handle( new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, message)); } {code} 1.4 Following message is not clear enough: {code} String message = "Application " + applicationId + " submission by user " + user + " to queue: " + queueName + " failed : " + "Queue mapping does not exist for user"; {code} It should say directly specify a autocreated queue name is prohibited, it has to be automatically mapped, etc. 1.5 I'm not sure if this check is necessary, I think previous logics should be enough to detect this correct? {code} else if (!queue.getParent().getQueueName().equals( placementContext.getParentQueue())) { String message = "Auto created Leaf queue " + placementContext.getQueue() + " already exists under " + queue .getParent().getQueuePath() + ".But Queue mapping has a different parent queue " + placementContext.getParentQueue() + " for the specified user : " + user; this.rmContext.getDispatcher().getEventHandler().handle( new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, message)); return; } {code} 1.6 clock is still here, move to a separate patch? 2) CapacitySchedulerConfiguration: - getQueuePlacementRules is unused. - Make sure all new added methods/fields are {{@Private}} - {{FAIL_AUTO_CREATION_ON_EXCEEDING_CAPACITY}} is this necessary? Should we just fail leaf queue creation when it exceeds parent queue's limit? Renames: - AutoCreatedLeafQueueTemplate.Builder#capacity => capacities Unecesssary changes: - CapacitySchedulerContext - AbstractCSQueue Miscs: - Did you accidentally included YARN-6124 in this patch? Could you revert that part? > Implement Auto Queue Creation with modifications to queue mapping flow > -- > > Key: YARN-7419 > URL: https://issues.apache.org/jira/browse/YARN-7419 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad > Attachments: YARN-7419.1.patch, YARN-7419.2.patch, YARN-7419.3.patch, > YARN-7419.patch > > > This involves changes to queue mapping flow to pass along context information > for auto queue creation. Auto creation of queues will be part of Capacity > Scheduler flow while attempting to resolve queues during application > submission. The leaf queues which do not exist are auto created under parent > queues which have been explicitly enabled for auto queue creation . In order > to determine which parent queue to create the leaf queues under - parent > queues need to be specified in queue mapping configuration -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244622#comment-16244622 ] Haibo Chen commented on YARN-7388: -- I believe the OOM-led test failures are unrelated, let me retrigger the jenkins to double check > TestAMRestart should be scheduler agnostic > -- > > Key: YARN-7388 > URL: https://issues.apache.org/jira/browse/YARN-7388 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7388.00.patch, YARN-7388.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired
[ https://issues.apache.org/jira/browse/YARN-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie resolved YARN-7425. --- Resolution: Won't Fix > Failed to renew delegation token when RM user's TGT is expired > --- > > Key: YARN-7425 > URL: https://issues.apache.org/jira/browse/YARN-7425 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.2 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Critical > Attachments: rm_log.png > > > we have a secure hadoop cluster with namenode federation. > submit job fails after kerberos TGT maxLifeTime expired(default 24h), client > log shows" failed to renew token: HDFS_DELEGATION_TOKEN...". > check rm log, found rm tgt is expired but not triggers relogin(),just retry > and fail... > (rm log see screenshot) > digging in code: > when rm tries to renewToken(), > UserGroupInformation.getLoginUser()="rm", > but UserGroupInformation.getCurrentUser()="testUser". > this causes Client.shouldAuthenticateOverKrb() returns false, thus cant > trigger reloginFromKeytab() or reloginFromTicketCache(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7406) Moving logging APIs over to slf4j in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245217#comment-16245217 ] Bibin A Chundatt commented on YARN-7406: [~Cyl] Thank you for confirming .Still wondering how did i miss {{ResourceUtils}}. Will commit patch today. > Moving logging APIs over to slf4j in hadoop-yarn-api > > > Key: YARN-7406 > URL: https://issues.apache.org/jira/browse/YARN-7406 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yeliang Cang >Assignee: Yeliang Cang > Attachments: YARN-7406.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-7143: --- Attachment: YARN-7143.003.patch > FileNotFound handling in ResourceUtils is inconsistent > -- > > Key: YARN-7143 > URL: https://issues.apache.org/jira/browse/YARN-7143 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7143.002.patch, YARN-7143.003.patch, > YARN-7143.YARN-3926.001.patch > > > When loading the resource-types.xml file, we warn and move on if it's not > found. When loading the node-resource.xml file, we abort loading resource > types if the file isn't found. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-7143: --- Attachment: (was: YARN-7143.003.patch) > FileNotFound handling in ResourceUtils is inconsistent > -- > > Key: YARN-7143 > URL: https://issues.apache.org/jira/browse/YARN-7143 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-7143.002.patch, YARN-7143.003.patch, > YARN-7143.YARN-3926.001.patch > > > When loading the resource-types.xml file, we warn and move on if it's not > found. When loading the node-resource.xml file, we abort loading resource > types if the file isn't found. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7437) Give SchedulingPlacementSet to a better name.
[ https://issues.apache.org/jira/browse/YARN-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-7437: - Attachment: YARN-7437.005.patch Fixed some checkstyle issues before committing. Uploading patch here first to make sure Jenkins is OK. > Give SchedulingPlacementSet to a better name. > - > > Key: YARN-7437 > URL: https://issues.apache.org/jira/browse/YARN-7437 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-7437.001.patch, YARN-7437.002.patch, > YARN-7437.003.patch, YARN-7437.004.patch, YARN-7437.005.patch > > > Currently, the SchedulingPlacementSet is very confusing. Here're its > responsibilities: > 1) Store ResourceRequests. (Or SchedulingRequest after YARN-6592). > 2) Decide order of nodes to allocate when there're multiple node candidates. > 3) Decide if we should reject node for given requests. > 4) Store any states/cache can help make decision for #2/#3 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7413) Support resource type in SLS
[ https://issues.apache.org/jira/browse/YARN-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-7413: --- Attachment: YARN-7413.003.patch Uploaded patch v3 for the style issues and whitespace issues. > Support resource type in SLS > > > Key: YARN-7413 > URL: https://issues.apache.org/jira/browse/YARN-7413 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-7413.001.patch, YARN-7413.002.patch, > YARN-7413.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245042#comment-16245042 ] Haibo Chen commented on YARN-7388: -- The unit test failure is unrelated, tracked at YARN-5684 > TestAMRestart should be scheduler agnostic > -- > > Key: YARN-7388 > URL: https://issues.apache.org/jira/browse/YARN-7388 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7388.00.patch, YARN-7388.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-7458: Attachment: YARN-7458.003.patch It wasn't too spammy on my computer, but I guess it could be on something slower. The 003 patch removes that log message. Instead, we log once before starting the {{waitFor}} and also if there's a {{TimeoutException}}, to make things clearer. > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-7458.001.patch, YARN-7458.002.patch, > YARN-7458.003.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent
[ https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245085#comment-16245085 ] Hadoop QA commented on YARN-7143: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 2s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7143 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12896775/YARN-7143.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f90468f1a9b9 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0de1068 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/18410/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18410/testReport/ | | Max. process+thread count | 391 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey
[ https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245106#comment-16245106 ] Hudson commented on YARN-7458: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13206 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13206/]) YARN-7458. TestContainerManagerSecurity is still flakey (Contributed by (templedf: rev 49b4c0b334e5472dbbf71b042a6a6b1d4b2ce3b7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java > TestContainerManagerSecurity is still flakey > > > Key: YARN-7458 > URL: https://issues.apache.org/jira/browse/YARN-7458 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 3.0.0, 3.1.0 > > Attachments: YARN-7458.001.patch, YARN-7458.002.patch, > YARN-7458.003.patch > > > YARN-6150 made this less flakey, but we're still seeing an occasional issue > here: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356) > at > org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org