[
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912200#comment-16912200
]
Prabhu Joseph edited comment on YARN-9755 at 8/21/19 11:55 AM:
---------------------------------------------------------------
[~eyang] RM Server creates a {{Configuration}} object which reads proxy ACL
list from Local core-site.xml ({{hadoop.proxyuser.yarn.groups}}) which is
overridden by Hdfs core-site.xml. This proxy settings are again overridden by
Local yarn-site.xml ({{yarn.resourcemanager.proxyuser.yarn.groups}}) which is
overridden by Hdfs yarn-site.xml.
The order of override is
Local core-site.xml ({{hadoop.proxyuser.yarn.groups}}) -> Hdfs core-site.xml ->
Local yarn-site.xml ({{yarn.resourcemanager.proxyuser.yarn.groups}}) -> Hdfs
yarn-site.xml
The above issue happens if the latest value of
\{{hadoop.proxyuser.yarn.groups}} after all override does not allow the hbase
user. User can maintain proxy ACL list in any of the above four, it is error
prone if each one has different value.
Can you check if the latest value of {{hadoop.proxyuser.yarn.groups}} after all
override has user list which allows hbase user.
Have attached [^YARN-9755-004.patch] which does reading Hdfs yarn-site.xml
before Proxy User refresh. Irrespective of this patch 4, the above issue should
work fine.
was (Author: prabhu joseph):
[~eyang] RM Server creates a {{Configuration}} object which reads proxy ACL
list from Local core-site.xml ({{hadoop.proxyuser.yarn.groups}}) which is
overridden by Hdfs core-site.xml. This proxy settings are again overridden by
Local yarn-site.xml ({{yarn.resourcemanager.proxyuser.yarn.groups}}) which is
overridden by Hdfs yarn-site.xml.
The order of override is
Local core-site.xml ({{hadoop.proxyuser.yarn.groups}}) -> Hdfs core-site.xml ->
Local yarn-site.xml ({{yarn.resourcemanager.proxyuser.yarn.groups}}) -> Hdfs
yarn-site.xml
The above issue happens if the latest value of \{{hadoop.proxyuser.yarn.groups
}} after all override does not allow the hbase user. User can maintain proxy
ACL list in any of the above four, it is error prone if each one has different
value.
Can you check if the latest value of {{hadoop.proxyuser.yarn.groups}} after all
override has user list which allows hbase user.
Have attached [^YARN-9755-004.patch] which does reading Hdfs yarn-site.xml
before Proxy User refresh. Irrespective of this patch 4, the above issue should
work fine.
> RM fails to start with FileSystemBasedConfigurationProvider
> -----------------------------------------------------------
>
> Key: YARN-9755
> URL: https://issues.apache.org/jira/browse/YARN-9755
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 3.3.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
> Attachments: YARN-9755-001.patch, YARN-9755-002.patch,
> YARN-9755-003.patch, YARN-9755-004.patch
>
>
> RM fails to start with below exception when
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException:
> java.io.IOException: Filesystem closed
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 14 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
> at
> org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56)
> {code}
> FileSystemBasedConfigurationProvider uses the cached FileSystem causing the
> issue.
> *Configs:*
> {code}
> <property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider</value></property>
> <property><name>yarn.resourcemanager.configuration.file-system-based-store</name><value>/yarn/conf</value></property>
> [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf
> -rw-r--r-- 3 yarn supergroup 4138 2019-08-16 13:09
> /yarn/conf/capacity-scheduler.xml
> -rw-r--r-- 3 yarn supergroup 494 2019-08-16 11:41
> /yarn/conf/core-site.xml
> -rw-r--r-- 3 yarn supergroup 11392 2019-08-16 11:52
> /yarn/conf/hadoop-policy.xml
> -rw-r--r-- 3 yarn supergroup 11492 2019-08-16 11:41
> /yarn/conf/yarn-site.xml
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]