Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/969#discussion_r13417150
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -220,10 +220,21 @@ trait ClientBase extends Logging {
}
}
+ def getArg(arg: String, envVar: String, sysProp: String): String = {
+ if (arg != null && !arg.isEmpty) {
+ arg
+ } else if (System.getenv(envVar) != null &&
!System.getenv(envVar).isEmpty) {
+ System.getenv(envVar)
+ } else {
+ sparkConf.getOption(sysProp).orNull
+ }
+ }
var cachedSecondaryJarLinks = ListBuffer.empty[String]
- val fileLists = List( (args.addJars, LocalResourceType.FILE, true),
- (args.files, LocalResourceType.FILE, false),
- (args.archives, LocalResourceType.ARCHIVE, false) )
+ val fileLists = List((args.addJars, LocalResourceType.FILE, true),
+ (getArg(args.files, "SPARK_YARN_DIST_FILES",
"spark.yarn.dist.files"),
+ LocalResourceType.FILE, false),
+ (getArg(args.archives, "SPARK_YARN_DIST_ARCHIVES",
"spark.yarn.dist.archives"),
--- End diff --
I don't think env variables and conf entries should be handled here like
this.
YarnClientSchedulerBackend already deals with the env variable and command
line option for client mode. It seems that SparkSubmit might be missing code to
handle the env variable for cluster mode, though. Probably better to fix it
there, and leave this code to deal only with the command line args (which are
already correctly parsed).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---