[
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912516#comment-16912516
]
Eric Yang commented on YARN-9755:
---------------------------------
[~Prabhu Joseph] I look at my test environment again and found the error was on
my side. HDFS copy of configuration did not get uploaded properly. Hence,
patch 003 is more correctly written than patch 004. Patch 004 will use hdfs
copy fully for proxy users acl but I am not sure if HDFS copy should include
server side configuration. For example ssl-cert.xml can contain keystore
password, which would be a security hole, if everyone can read the config from
hdfs. If we can agree on HDFS copy only contains client configuration, it
would be a safer feature.
> RM fails to start with FileSystemBasedConfigurationProvider
> -----------------------------------------------------------
>
> Key: YARN-9755
> URL: https://issues.apache.org/jira/browse/YARN-9755
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 3.3.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
> Attachments: YARN-9755-001.patch, YARN-9755-002.patch,
> YARN-9755-003.patch, YARN-9755-004.patch
>
>
> RM fails to start with below exception when
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException:
> java.io.IOException: Filesystem closed
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 14 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
> at
> org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56)
> {code}
> FileSystemBasedConfigurationProvider uses the cached FileSystem causing the
> issue.
> *Configs:*
> {code}
> <property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider</value></property>
> <property><name>yarn.resourcemanager.configuration.file-system-based-store</name><value>/yarn/conf</value></property>
> [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf
> -rw-r--r-- 3 yarn supergroup 4138 2019-08-16 13:09
> /yarn/conf/capacity-scheduler.xml
> -rw-r--r-- 3 yarn supergroup 494 2019-08-16 11:41
> /yarn/conf/core-site.xml
> -rw-r--r-- 3 yarn supergroup 11392 2019-08-16 11:52
> /yarn/conf/hadoop-policy.xml
> -rw-r--r-- 3 yarn supergroup 11492 2019-08-16 11:41
> /yarn/conf/yarn-site.xml
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]