Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/601#discussion_r12181301
--- Diff: docs/running-on-yarn.md ---
@@ -44,86 +28,47 @@ System Properties:
* `spark.yarn.max.executor.failures`, the maximum number of executor
failures before failing the application. Default is the number of executors
requested times 2 with minimum of 3.
* `spark.yarn.historyServer.address`, the address of the Spark history
server (i.e. host.com:18080). The address should not contain a scheme
(http://). Defaults to not being set since the history server is an optional
service. This address is given to the Yarn ResourceManager when the Spark
application finishes to link the application from the ResourceManager UI to the
Spark history server UI.
+By default, Spark on YARN will use a Spark jar installed locally, but the
Spark jar can also be in a world-readable location on HDFS. This allows YARN to
cache it on nodes so that it doesn't need to be distributed each time an
application runs. To point to a jar on HDFS, export SPARK_JAR=hdfs:/some/path.
--- End diff --
jw - is it normal to do `hdfs:/some/path` and not `hdfs://some/path`? I
think they are technically both valid URL's.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---