Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19130#discussion_r138154503
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -367,6 +368,52 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
           }.orNull
         }
     
    +    // Using a dummy http URI to check if HTTP(s) FileSystem is available, 
it returns true in
    +    // Hadoop 2.9+, otherwise it returns false.
    +    val isHttpFsAvailable = Try { 
FileSystem.get(Utils.resolveURI("http://foo/bar";), hadoopConf) }
    +      .map(_ => true)
    +      .getOrElse(false)
    +    // When running in YARN cluster manager, we check the configuration
    +    // "spark.yarn.dist.forceDownloadResources", if true we always 
download remote HTTP(s)
    +    // resources to local and then re-upload them to Hadoop FS, if false 
we need to check the
    +    // availability of HTTP(s) FileSystem to decide wether to use HTTP(s) 
FS to handle resources
    +    // or not.
    +    if (clusterManager == YARN && (sparkConf.get(FORCE_DOWNLOAD_RESOURCES) 
|| !isHttpFsAvailable)) {
    --- End diff --
    
    do we somehow want to make this configurable per scheme?  Right now its 
basically http/https, in the future would we want to possibly handle other 
filesystems that hadoop doesn't support.  Making this a settable config would 
make that easier


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to