[ 
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910651#comment-16910651
 ] 

Eric Yang commented on YARN-9755:
---------------------------------

[~Prabhu Joseph] Thank you for the patch.  I found an odd problem that in HA 
enabled Resource Manager in a secure cluster.  One of the resource manager 
failed to start with patch 002:
It looks like RM tries to access file system without using a kerberos 
credential.

{code}
2019-08-19 18:42:58,683 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX 
signal handlers for [TERM, HUP, INT]
2019-08-19 18:42:59,966 WARN org.apache.hadoop.ipc.Client: Exception 
encountered while connecting to the server : 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]
2019-08-19 18:42:59,975 INFO org.apache.hadoop.service.AbstractService: Service 
ResourceManager failed in state INITED
java.io.IOException: DestHost:destPort eyang-1.openstacklocal:9000 , 
LocalHost:localPort eyang-2.openstacklocal/172.26.111.18:0. Failed on local 
exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:812)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1557)
        at org.apache.hadoop.ipc.Client.call(Client.java:1499)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:664)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2443)
        at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2419)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1328)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1325)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1342)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1317)
        at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2300)
        at 
org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.initInternal(FileSystemBasedConfigurationProvider.java:88)
        at 
org.apache.hadoop.yarn.conf.ConfigurationProvider.init(ConfigurationProvider.java:39)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:272)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1568)
Caused by: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:769)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
        at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:732)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:826)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1627)
        at org.apache.hadoop.ipc.Client.call(Client.java:1443)
        ... 28 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
        at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:179)
        at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:813)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:809)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:809)
        ... 31 more
{code}

> RM fails to start with FileSystemBasedConfigurationProvider
> -----------------------------------------------------------
>
>                 Key: YARN-9755
>                 URL: https://issues.apache.org/jira/browse/YARN-9755
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 3.3.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: YARN-9755-001.patch, YARN-9755-002.patch
>
>
> RM fails to start with below exception when 
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> java.io.IOException: Filesystem closed
>         at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         ... 14 more
> Caused by: java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
>         at 
> org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56)
> {code}
> FileSystemBasedConfigurationProvider uses the cached FileSystem causing the 
> issue.
> *Configs:*
> {code}
> <property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider</value></property>
> <property><name>yarn.resourcemanager.configuration.file-system-based-store</name><value>/yarn/conf</value></property>
> [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf
> -rw-r--r--   3 yarn supergroup       4138 2019-08-16 13:09 
> /yarn/conf/capacity-scheduler.xml
> -rw-r--r--   3 yarn supergroup        494 2019-08-16 11:41 
> /yarn/conf/core-site.xml
> -rw-r--r--   3 yarn supergroup      11392 2019-08-16 11:52 
> /yarn/conf/hadoop-policy.xml
> -rw-r--r--   3 yarn supergroup      11492 2019-08-16 11:41 
> /yarn/conf/yarn-site.xml
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to