Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/601#discussion_r12179554
--- Diff: docs/cluster-overview.md ---
@@ -118,21 +118,25 @@ If you are ever unclear where configuration options
are coming from. fine-graine
information can be printed by adding the `--verbose` option to
`./spark-submit`.
### Advanced Dependency Management
-When using `./bin/spark-submit` jars will be automatically transferred to
the cluster. For many
-users this is sufficient. However, advanced users can add jars by calling
`addFile` or `addJar`
+When using `./bin/spark-submit` the app jar will be automatically
transferred to the cluster. For
+many users this is sufficient. However, advanced users can add jars by
calling `addFile` or `addJar`
on an existing SparkContext. This can be used to distribute JAR files
(Java/Scala) or .egg and
.zip libraries (Python) to executors. Spark uses the following URL scheme
to allow different
strategies for disseminating jars:
- **file:** - Absolute paths and `file:/` URIs are served by the driver's
HTTP file server, and
- every executor pulls the file from the driver HTTP server
+ every executor pulls the file from the driver HTTP server. When running
the driver in the cluster,
+ the jars need a way of getting from the client to the driver so that it
can host them. This is not
+ currently supported with Spark standalone, and on YARN this requires
passing additional jars on the
+ command line with the `--jars` option.
- **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and
JARs from the URI as expected
- **local:** - a URI starting with local:/ is expected to exist as a local
file on each worker node. This
means that no network IO will be incurred, and works well for large
files/JARs that are pushed to each worker,
or shared via NFS, GlusterFS, etc.
Note that JARs and files are copied to the working directory for each
SparkContext on the executor nodes.
-Over time this can use up a significant amount of space and will need to
be cleaned up.
+With Mesos and the Spark Standalone cluster manager, this can use up a
significant amount of space over
--- End diff --
Is this still an issue for Mesos?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---