Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2232#issuecomment-56091918
Hi @tgravescs, I believe every point of the behavior you listed is correct
and preserved in this PR, since it only affects `spark.yarn.dist.*` and these
are resolved to `file://` if relative paths are provided. Then it seems OK to
keep the `spark.yarn.dist.*` in the list of things to resolve in `SparkSubmit`.
In addition, here's a tangential clarification question: Isn't setting
SPARK_YARN_DIST_* meaningless in cluster mode, because the driver is launched
on one of the slave nodes and the resources specified here should already be
visible to the executors, which are launched on the same nodes? If so, it will
simplify things if we always treat the paths specified through these variables
as `hdfs://` paths regardless of the deploy mode. I believe we currently
already do this in
[ClientArguments](https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala),
where we distinguish between client and cluster mode only in the comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]