Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19074#discussion_r140159253
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -366,7 +376,7 @@ object SparkSubmit extends CommandLineUtils {
// If a python file is provided, add it to the child arguments and
list of files to deploy.
// Usage: PythonAppRunner <main python file> <extra python files>
[app arguments]
args.mainClass = "org.apache.spark.deploy.PythonRunner"
- args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles)
++ args.childArgs
+ args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles)
++ args.childArgs
if (clusterManager != YARN) {
// The YARN backend distributes the primary file differently, so
don't merge it.
args.files = mergeFileLists(args.files, args.primaryResource)
--- End diff --
Why do you think it should use local files/primary resource here?
`args.files` will be assigned to "spark.files" for non-yarn deploy, and Spark's
fileserver will download them to local for all the executors, so it should be
fine to keep as remote resources.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]