Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/15669
@jerryshao spark.files is always passed to driver so SparkContext.addFile
is called in yarn-cluster.
https://github.com/apache/spark/blob/7bf8a4049866b2ec7fdf0406b1ad0c3a12488645/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L609
```
// Load any properties specified through --conf and the default
properties file
for ((k, v) <- args.sparkProperties) {
sysProps.getOrElseUpdate(k, v)
}
```
Seems the issue is that spark.files don't need to be passed to driver in
yarn-cluster mode. In that case it can be fixed in SparkSubmit.scala, another
thing is that I notice some suspicious code in SparkContext.addJar, Is the
following code still needed ?
https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/scala/org/apache/spark/SparkContext.scala#L1710
```
if (master == "yarn" && deployMode == "cluster") {
// In order for this to work in yarn cluster mode the user
must specify the
// --addJars option to the client to upload the file into the
distributed cache
// of the AM to make it show up in the current working
directory.
val fileName = new Path(uri.getPath).getName()
try {
env.rpcEnv.fileServer.addJar(new File(fileName))
} catch {
case e: Exception =>
// For now just log an error but allow to go through so
spark examples work.
// The spark examples don't really need the jar
distributed since its also
// the app jar.
logError("Error adding jar (" + e + "), was the --addJars
option used?")
null
}
} else {
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]