Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19130
Hi @cloud-fan , the main purpose of `spark.yarn.dist.forceDownloadSchemes`
is to explicitly using Spark's own logic to handle remote resources instead of
relying on Hadoop. For example if `spark.yarn.dist.forceDownloadSchemes` is
configured to `http,https`, then this 2 kinds of resources will be downloaded
by Spark prior to add to dist cache, even if they're supported by http FS in
Hadoop 2.9+. For now if we use Hadoop 2.9-, since Hadoop doesn't support http
FS, so we will always leverage Spark's own logic to download resources, it is
not necessary to configure this parameter.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]