Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/19130#discussion_r138154503
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -367,6 +368,52 @@ object SparkSubmit extends CommandLineUtils with
Logging {
}.orNull
}
+ // Using a dummy http URI to check if HTTP(s) FileSystem is available,
it returns true in
+ // Hadoop 2.9+, otherwise it returns false.
+ val isHttpFsAvailable = Try {
FileSystem.get(Utils.resolveURI("http://foo/bar"), hadoopConf) }
+ .map(_ => true)
+ .getOrElse(false)
+ // When running in YARN cluster manager, we check the configuration
+ // "spark.yarn.dist.forceDownloadResources", if true we always
download remote HTTP(s)
+ // resources to local and then re-upload them to Hadoop FS, if false
we need to check the
+ // availability of HTTP(s) FileSystem to decide wether to use HTTP(s)
FS to handle resources
+ // or not.
+ if (clusterManager == YARN && (sparkConf.get(FORCE_DOWNLOAD_RESOURCES)
|| !isHttpFsAvailable)) {
--- End diff --
do we somehow want to make this configurable per scheme? Right now its
basically http/https, in the future would we want to possibly handle other
filesystems that hadoop doesn't support. Making this a settable config would
make that easier
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]