[ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552384#comment-16552384
 ] 

Bibin A Chundatt edited comment on YARN-8541 at 7/23/18 6:24 AM:
-----------------------------------------------------------------

Incase when queue is already provided, the app will get submitted to the queue 
specified .. Exception will not be thrown. 

But that is expected and old behaviour too rt ??


was (Author: bibinchundatt):
Incase when queue is already provides, the app will get submitted to the queue 
specified .. Exception will not be thrown. 

But that is expected and old behaviour too rt ??

> RM startup failure on recovery after user deletion
> --------------------------------------------------
>
>                 Key: YARN-8541
>                 URL: https://issues.apache.org/jira/browse/YARN-8541
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: yimeng
>            Assignee: Bibin A Chundatt
>            Priority: Blocker
>         Attachments: YARN-8541.001.patch, YARN-8541.002.patch, 
> YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<<memory:512, vCores:1>>, maximumAllocation=<<memory:65536, 
> vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application application_1531624956005_0001 submitted by user super reason: No 
> groups found for user super
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:206)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:68)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:798)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:369)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1455)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:828)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  ... 13 more
> 2018-07-16 16:24:59,713 | INFO | main-EventThread | Trying to re-establish ZK 
> session | ActiveStandbyElector.java:746
> 2018-07-16 16:24:59,715 | INFO | main-EventThread | Session: 
> 0x1100001cdf8c2ea7 closed | ZooKeeper.java:1325
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | Initiating client 
> connection, 
> connectString=187-4-64-187:24002,187-4-64-119:24002,187-4-64-248:24002 
> sessionTimeout=45000 
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@62f6291c
>  | ZooKeeper.java:861
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.request.timeout 
> configured value is 120000. | ClientCnxn.java:141
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | 
> zookeeper.client.bind.port.range is not configured. | ClientCnxn.java:177



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to