[jira] [Comment Edited] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969006#comment-16969006 ] Xianghao Lu edited comment on YARN-9957 at 11/7/19 7:47 AM: The failure above has nothing to do with the patch, [~asuresh] [~rkanter], Can you please review the patch? was (Author: luxianghao): The failure above has nothing to do with the patch, [~arun.sur...@gmail.com] [~rkanter], Can you please review the patch? > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969006#comment-16969006 ] Xianghao Lu commented on YARN-9957: --- The failure above has nothing to do with the patch, [~arun.sur...@gmail.com] [~rkanter], Can you please review the patch? > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968999#comment-16968999 ] Hadoop QA commented on YARN-9957: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 21m 50s{color} | {color:red} Docker failed to build yetus/hadoop:ef54f78530d. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9957 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985156/YARN-9957-branch-2.9.1.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25112/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968976#comment-16968976 ] Xianghao Lu commented on YARN-9957: --- The test sample is as described below # submit a MR job, am container capacity is 2048 mb, map container capacity is 4096 mb, reduce container capacity is 4096 mb ,resouce usage is shown in 1.jpg # restart rm, after rm recovered, resouce usage is shown in 2.jpg and I found am used resouce is 4096 mb !1.jpg! > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghao Lu updated YARN-9957: -- Attachment: 2.jpg 1.jpg > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9956) Improve connection error message for YARN ApiServerClient
[ https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968973#comment-16968973 ] Prabhu Joseph commented on YARN-9956: - [~eyang] Yes Sure, will work on this. Thanks. > Improve connection error message for YARN ApiServerClient > - > > Key: YARN-9956 > URL: https://issues.apache.org/jira/browse/YARN-9956 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > In HA environment, yarn.resourcemanager.webapp.address configuration is > optional. ApiServiceClient may produce confusing error message like this: > {code} > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host1.example.com:8090 > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host2.example.com:8090 > 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at > java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60) > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55) > ... 21 more > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: > java.io.IOException: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) >
[jira] [Created] (YARN-9957) The first container we recover may not be the AM
Xianghao Lu created YARN-9957: - Summary: The first container we recover may not be the AM Key: YARN-9957 URL: https://issues.apache.org/jira/browse/YARN-9957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.9.1 Reporter: Xianghao Lu Assignee: Xianghao Lu Fix For: 2.9.1 YARN-7382 says that if not running unmanaged, the first container we recover is always the AM, however, the actual situation is not like this, this can lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9956) Improve connection error message for YARN ApiServerClient
[ https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-9956: --- Assignee: Prabhu Joseph > Improve connection error message for YARN ApiServerClient > - > > Key: YARN-9956 > URL: https://issues.apache.org/jira/browse/YARN-9956 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Assignee: Prabhu Joseph >Priority: Major > > In HA environment, yarn.resourcemanager.webapp.address configuration is > optional. ApiServiceClient may produce confusing error message like this: > {code} > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host1.example.com:8090 > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host2.example.com:8090 > 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at > java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60) > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55) > ... 21 more > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: > java.io.IOException: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) > at >
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968947#comment-16968947 ] zhoukang commented on YARN-9537: [~tangzhankun] could you help review this? thanks > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang reassigned YARN-9612: -- Assignee: zhoukang > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9739) appsTableData in AppsBlock may cause OOM
[ https://issues.apache.org/jira/browse/YARN-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang reassigned YARN-9739: -- Assignee: zhoukang > appsTableData in AppsBlock may cause OOM > > > Key: YARN-9739 > URL: https://issues.apache.org/jira/browse/YARN-9739 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: heap0.png, heap1.png, stack.png > > > If we have many users list the applications, it may cause RM OOM -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968848#comment-16968848 ] Íñigo Goiri commented on YARN-9768: --- +1 on [^YARN-9768.008.patch]. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968792#comment-16968792 ] Hadoop QA commented on YARN-9937: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 36m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 98m 12s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}202m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9937 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985112/YARN-9937-addendum-01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5dfdf54c1216 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dd90025 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25111/testReport/ | | Max. process+thread count | 830 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25111/console | | Powered by | Apache Yetus
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968732#comment-16968732 ] Eric Payne commented on YARN-8292: -- It looks like the extended resource functionality for preemption unit tests may be dependent on changes made in YARN-7411 to ProportionalCapacityPreemptionPolicyMockFramework#parseResourceFromString. YARN-7411 was only put into 3.1.0. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, > YARN-8292.009.patch, YARN-8292.branch-2.009.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968656#comment-16968656 ] Eric Payne commented on YARN-8292: -- I backported this to branch-2 and attached YARN-8292.branch-2.009.patch. In my manual tests on a 4-node pseudo cluster, it allows preemptions to proceed in the case where the dominant resource is above the queue capacity and non-dominant resource(s) is (are) less. However, I have not put the JIRA into patch-submitted state because the two unit tests added to test preemption with 3 resources are failing. I dug into it a little bit and see that in 2.10, when it allocates resources to the Mock queue, the extended resource is not added to the current configuration or usage of the queue. [~leftnoteasy] / [~sunilg] / [~jhung], are you aware of any missing extended resource configuration that should be backported for the 2.10 RM / CS mocks? Here is one of the test failures: {noformat} [ERROR] TestProportionalCapacityPreemptionPolicyInterQueueWithDRF.test3ResourceTypesInterQueuePreemption:117 Wanted but not invoked: eventHandler.handle( ); -> at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF.test3ResourceTypesInterQueuePreemption(TestProportionalCapacityPreemptionPolicyInterQueueWithDRF.java:117) Actually, there were zero interactions with this mock. {noformat} > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, > YARN-8292.009.patch, YARN-8292.branch-2.009.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9937) Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9937: Attachment: YARN-9937-addendum-01.patch > Add missing queue configs in RMWebService#CapacitySchedulerQueueInfo > > > Key: YARN-9937 > URL: https://issues.apache.org/jira/browse/YARN-9937 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: Screen Shot 2019-10-28 at 8.54.53 PM.png, > YARN-9937-001.patch, YARN-9937-002.patch, YARN-9937-003.patch, > YARN-9937-004.patch, YARN-9937-addendum-01.patch, > YARN-9937-branch-3.2.001.patch, YARN-9937-branch-3.2.002.patch > > > Below are the missing queue configs which are not part of RMWebServices > scheduler endpoint. > 1. Maximum Allocation > 2. Queue ACLs > 3. Queue Priority > 4. Application Lifetime -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-8292: - Attachment: YARN-8292.branch-2.009.patch > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, > YARN-8292.009.patch, YARN-8292.branch-2.009.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9956) Improve connection error message for YARN ApiServerClient
[ https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968632#comment-16968632 ] Eric Yang commented on YARN-9956: - [~prabhujoseph] can you help out with this issue? Thanks > Improve connection error message for YARN ApiServerClient > - > > Key: YARN-9956 > URL: https://issues.apache.org/jira/browse/YARN-9956 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > In HA environment, yarn.resourcemanager.webapp.address configuration is > optional. ApiServiceClient may produce confusing error message like this: > {code} > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host1.example.com:8090 > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host2.example.com:8090 > 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at > java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) > at > java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60) > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55) > ... 21 more > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: > java.io.IOException: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894)
[jira] [Commented] (YARN-9956) Improve connection error message for YARN ApiServerClient
[ https://issues.apache.org/jira/browse/YARN-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968629#comment-16968629 ] Eric Yang commented on YARN-9956: - Krb5.log evidence shows ApiServiceClient attempting to acquire TGS for both resource manager and also the non-existed RM. {code} Oct 30 22:01:41 host1.example.com krb5kdc[4157](info): TGS_REQ (8 etypes {18 17 20 19 16 23 1 3}) 172.27.135.195: ISSUE: authtime 1572472015, etypes {rep=16 tkt=16 ses=16}, hb...@example.com for HTTP/host1.example@example.com Oct 30 22:01:41 host1.example.com krb5kdc[4157](info): TGS_REQ (8 etypes {18 17 20 19 16 23 1 3}) 172.27.135.195: ISSUE: authtime 1572472015, etypes {rep=16 tkt=16 ses=16}, hb...@example.com for krbtgt/example@example.com Oct 30 22:01:42 host1.example.com krb5kdc[4157](info): TGS_REQ (8 etypes {18 17 20 19 16 23 1 3}) 172.27.135.195: ISSUE: authtime 1572472015, etypes {rep=16 tkt=16 ses=16}, hb...@example.com for HTTP/host2.example@example.com Oct 30 22:01:42 host1.example.com krb5kdc[4157](info): TGS_REQ (8 etypes {18 17 20 19 16 23 1 3}) 172.27.135.195: ISSUE: authtime 1572472015, etypes {rep=16 tkt=16 ses=16}, hb...@example.com for krbtgt/example@example.com Oct 30 22:01:42 host1.example.com krb5kdc[4157](info): TGS_REQ (8 etypes {18 17 20 19 16 23 1 3}) 172.27.135.195: LOOKING_UP_SERVER: authtime 0, hb...@example.com for HTTP/0.0@example.com, Server not found in Kerberos database {code} > Improve connection error message for YARN ApiServerClient > - > > Key: YARN-9956 > URL: https://issues.apache.org/jira/browse/YARN-9956 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Yang >Priority: Major > > In HA environment, yarn.resourcemanager.webapp.address configuration is > optional. ApiServiceClient may produce confusing error message like this: > {code} > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host1.example.com:8090 > 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: > host2.example.com:8090 > 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms > 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) > at > java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) > at > org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) > Caused by: KrbException: Server not found in Kerberos database (7) - > LOOKING_UP_SERVER > at > java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) > at > java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) > at > java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) > at > java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) > at > java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) > ... 15 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at >
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968628#comment-16968628 ] Hadoop QA commented on YARN-9768: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985083/YARN-9768.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 29be7dd0f16b 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Created] (YARN-9956) Improve connection error message for YARN ApiServerClient
Eric Yang created YARN-9956: --- Summary: Improve connection error message for YARN ApiServerClient Key: YARN-9956 URL: https://issues.apache.org/jira/browse/YARN-9956 Project: Hadoop YARN Issue Type: Bug Reporter: Eric Yang In HA environment, yarn.resourcemanager.webapp.address configuration is optional. ApiServiceClient may produce confusing error message like this: {code} 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host1.example.com:8090 19/10/30 20:13:42 INFO client.ApiServiceClient: Fail to connect to: host2.example.com:8090 19/10/30 20:13:42 INFO util.log: Logging initialized @2301ms 19/10/30 20:13:42 ERROR client.ApiServiceClient: Error: {} GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:771) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:125) at org.apache.hadoop.yarn.service.client.ApiServiceClient$1.run(ApiServiceClient.java:105) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER at java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:73) at java.security.jgss/sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:251) at java.security.jgss/sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:262) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) at java.security.jgss/sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) at java.security.jgss/sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) at java.security.jgss/sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) ... 15 more Caused by: KrbException: Identifier doesn't match expected value (906) at java.security.jgss/sun.security.krb5.internal.KDCRep.init(KDCRep.java:140) at java.security.jgss/sun.security.krb5.internal.TGSRep.init(TGSRep.java:65) at java.security.jgss/sun.security.krb5.internal.TGSRep.(TGSRep.java:60) at java.security.jgss/sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:55) ... 21 more 19/10/30 20:13:42 ERROR client.ApiServiceClient: Fail to launch application: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:293) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:271) at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:416) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:589) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:125) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) at org.apache.hadoop.yarn.service.client.ApiServiceClient.generateToken(ApiServiceClient.java:105) at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:290) ... 6 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Server
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968539#comment-16968539 ] Shane Kumpf commented on YARN-9562: --- bq. So you're setting linux-container-executor.nonsecure-mode.limit-users to true with linux-container-executor.nonsecure-mode.local-user set to nobody in your yarn-site.xml? Is that the use case here? Exactly correct > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968537#comment-16968537 ] Jim Brennan commented on YARN-9562: --- [~ebadger], [~shaneku...@gmail.com] for the record, I ran with {{linux-container-executor.nonsecure-mode.limit-users=false}} > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968529#comment-16968529 ] Eric Badger commented on YARN-9562: --- [~shaneku...@gmail.com], thanks for the review! bq. I am running all containers as the nobody user in this case. So you're setting {{linux-container-executor.nonsecure-mode.limit-users}} to true with {{linux-container-executor.nonsecure-mode.local-user}} set to nobody in your yarn-site.xml? Is that the use case here? I'll address the rest of your comments in a followup patch. > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9768: --- Attachment: YARN-9768.008.patch > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968482#comment-16968482 ] Manikandan R commented on YARN-9768: Taken care. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968409#comment-16968409 ] Adam Antal commented on YARN-9011: -- Thanks for the patch, [~pbacsko]. I am not entirely sure that I perfectly understood this patch, but the above discussion made me confidence about this is the right approach. I think it would be a good idea to stick to the generic naming convention by renaming {{HostsFileReader$refresh(String,String,boolean)}} to {{refreshInternal}}. Could you please do that to make that class more clear? I was a bit concerned what happens if an unchecked exception occurs inside {{NodesListManager$handleExcludeNodeList}} (between calling {{HostsFileReader$lazyRefresh}} and {{HostsFileReader$finishRefresh}}), but I am assured that the internal structure will not get damaged by this. Also a side-question: why did you move the following line inside {{ResourceTrackerService$nodeHeartbeat}}. Since the same object is received and it is not possible that this object is added/removed during the two calls I would not touch it. {code:java} RMNode rmNode = this.rmContext.getRMNodes().get(nodeId); {code} I liked the extra null check you perform on {{rmNode.getState()}} in {{ResourceTrackerService $updateAppCollectorsMap}}. I'd take one step further - could you start that function with an rmNode == null condition (and returning false in that case), we could save the extra null checks in that function and it would also make the null check in {{NodesListManager $isGracefullyDecommissionableNode}} inessential. I'd give a +1 (non-binding) pending on these changes. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: >
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968284#comment-16968284 ] Hadoop QA commented on YARN-9605: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 17m 35s{color} | {color:red} root generated 5 new + 21 unchanged - 5 fixed = 26 total (was 26) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 40s{color} | {color:orange} root: The patch generated 4 new + 22 unchanged - 0 fixed = 26 total (was 22) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 43s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 55s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 4s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}233m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.TestTrash | | | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9605 | | JIRA Patch URL |
[jira] [Commented] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler
[ https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968219#comment-16968219 ] Sunil G commented on YARN-9920: --- cc [~wilfreds], cud u also pls take a look > YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from > FairScheduler > -- > > Key: YARN-9920 > URL: https://issues.apache.org/jira/browse/YARN-9920 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, security >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9920-001.patch, YARN-9920-002.patch, > YARN-9920-003.patch > > > YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of > FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be > null when the call is from RMWebServices and EventDispatcher. It works fine > when called by IPC Server Handler. > FSQueue#hasAccess is called at three places where (2) and (3) returns null. > *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess > -> Server.getRemoteAddress returns correct Remote IP.* > > *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> > AppAddedSchedulerEvent* > *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> > Server.getRemoteAddress returns null* > > {code:java} > org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > {code} > > *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> > Server.getRemoteAddress returns null.* > {code:java} > org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610) > at > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553) > {code} > > Have verified with CapacityScheduler and it works fine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org