[GitHub] spark pull request: SPARK-1492. Update Spark YARN docs to use spar...

pwendell Thu, 01 May 2014 01:02:59 -0700

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/601#discussion_r12181272
  
    --- Diff: docs/cluster-overview.md ---
    @@ -118,21 +118,25 @@ If you are ever unclear where configuration options 
are coming from. fine-graine
     information can be printed by adding the `--verbose` option to 
`./spark-submit`.
     
     ### Advanced Dependency Management
    -When using `./bin/spark-submit` jars will be automatically transferred to 
the cluster. For many
    -users this is sufficient. However, advanced users can add jars by calling 
`addFile` or `addJar`
    +When using `./bin/spark-submit` the app jar will be automatically 
transferred to the cluster. For
    +many users this is sufficient. However, advanced users can add jars by 
calling `addFile` or `addJar`
     on an existing SparkContext. This can be used to distribute JAR files 
(Java/Scala) or .egg and
     .zip libraries (Python) to executors. Spark uses the following URL scheme 
to allow different
     strategies for disseminating jars:
     
     - **file:** - Absolute paths and `file:/` URIs are served by the driver's 
HTTP file server, and
    -  every executor pulls the file from the driver HTTP server
    +  every executor pulls the file from the driver HTTP server. When running 
the driver in the cluster,
    +  the jars need a way of getting from the client to the driver so that it 
can host them. This is not
    +  currently supported with Spark standalone, and on YARN this requires 
passing additional jars on the
    +  command line with the `--jars` option.
     - **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and 
JARs from the URI as expected
     - **local:** - a URI starting with local:/ is expected to exist as a local 
file on each worker node.  This
       means that no network IO will be incurred, and works well for large 
files/JARs that are pushed to each worker,
       or shared via NFS, GlusterFS, etc.
     
     Note that JARs and files are copied to the working directory for each 
SparkContext on the executor nodes.
    -Over time this can use up a significant amount of space and will need to 
be cleaned up.
    +With Mesos and the Spark Standalone cluster manager, this can use up a 
significant amount of space over
    --- End diff --
    
    Ah okay fair enough, I guess this is still an issue for mesos.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1492. Update Spark YARN docs to use spar...

Reply via email to