Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/12678#issuecomment-215232131
No those are internal Spark configs. I mean either use the
--jars/--files/--archives options to spark-submit or use the corresponding
configs spark.yarn.dist.archives, spark.yarn.dist.files, spark.jars config
options. See http://spark.apache.org/docs/latest/running-on-yarn.html for
further descriptions of the configs. Or run spark-submit --help.
On YARN that will cause whatever files you specify to be downloaded on each
of the driver/am/executor node and will be put in ./. ./ is including in the
classpath so if its a file or jar you don't have to do anything else. If its
an archive it will be extracted and if the file you want in classpath is under
a sub directory you need to modify the extraClasspath. It properly handles
things in hdfs:// or in file://. If you specify something as file:// it looked
locally on your launcher box, uploads it to the hdfs staging directory and then
it gets downloaded onto the node. If its already in hdfs, YARN simply
downloaded it to the executor before launching.
Note the important note at the bottom of that page:
The --files and --archives options support specifying file names with the #
similar to Hadoop. For example you can specify: --files
localtest.txt#appSees.txt and this will upload the file you have locally named
localtest.txt into HDFS but this will be linked to by the name appSees.txt, and
your application should use the name as appSees.txt to reference it when
running on YARN.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]