Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/21734#discussion_r201322643
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
---
@@ -193,8 +193,7 @@ object YarnSparkHadoopUtil {
sparkConf: SparkConf,
hadoopConf: Configuration): Set[FileSystem] = {
val filesystemsToAccess = sparkConf.get(FILESYSTEMS_TO_ACCESS)
- .map(new Path(_).getFileSystem(hadoopConf))
- .toSet
+ val isRequestAllDelegationTokens = filesystemsToAccess.isEmpty
--- End diff --
`spark.yarn.access.hadoopFileSystems` is not used as what you think. I
don't think changing the semantics of `spark.yarn.access.hadoopFileSystems` is
a correct way.
Basically your problem is that not all the nameservices are accessible in
federated HDFS, currently the Hadoop token provider will throw an exception and
ignore the following FSs. I think it would be better to try-catch and ignore
bad cluster, that would be more meaningful compared to this fix.
If you don't want to get all tokens from all the nameservices, I think you
should change the hdfs configuration for Spark. Spark assumes that all the
nameservices is accessible. Also token acquisition is happened in application
submission, it is not a big problem whether the fetch is slow or not.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]