[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...

jerryshao Tue, 10 Jul 2018 05:30:15 -0700

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21734#discussion_r201322643
  
    --- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
    @@ -193,8 +193,7 @@ object YarnSparkHadoopUtil {
           sparkConf: SparkConf,
           hadoopConf: Configuration): Set[FileSystem] = {
         val filesystemsToAccess = sparkConf.get(FILESYSTEMS_TO_ACCESS)
    -      .map(new Path(_).getFileSystem(hadoopConf))
    -      .toSet
    +    val isRequestAllDelegationTokens = filesystemsToAccess.isEmpty
    --- End diff --
    
    `spark.yarn.access.hadoopFileSystems` is not used as what you think. I 
don't think changing the semantics of `spark.yarn.access.hadoopFileSystems` is 
a correct way.
    
    Basically your problem is that not all the nameservices are accessible in 
federated HDFS, currently the Hadoop token provider will throw an exception and 
ignore the following FSs. I think it would be better to try-catch and ignore 
bad cluster, that would be more meaningful compared to this fix.
    
    If you don't want to get all tokens from all the nameservices, I think you 
should change the hdfs configuration for Spark. Spark assumes that all the 
nameservices is accessible. Also token acquisition is happened in application 
submission, it is not a big problem whether the fetch is slow or not.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...

Reply via email to