[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

andrewor14 Thu, 18 Sep 2014 12:47:59 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2232#issuecomment-56091918
  
    Hi @tgravescs, I believe every point of the behavior you listed is correct 
and preserved in this PR, since it only affects `spark.yarn.dist.*` and these 
are resolved to `file://` if relative paths are provided. Then it seems OK to 
keep the `spark.yarn.dist.*` in the list of things to resolve in `SparkSubmit`.
    
    In addition, here's a tangential clarification question: Isn't setting 
SPARK_YARN_DIST_* meaningless in cluster mode, because the driver is launched 
on one of the slave nodes and the resources specified here should already be 
visible to the executors, which are launched on the same nodes? If so, it will 
simplify things if we always treat the paths specified through these variables 
as `hdfs://` paths regardless of the deploy mode. I believe we currently 
already do this in 
[ClientArguments](https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala),
 where we distinguish between client and cluster mode only in the comment.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

Reply via email to