[GitHub] spark pull request: SPARK-1818 Freshen Mesos documentation

pwendell Tue, 13 May 2014 00:44:30 -0700

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/756#discussion_r12570645
  
    --- Diff: docs/running-on-mesos.md ---
    @@ -3,19 +3,105 @@ layout: global
     title: Running Spark on Mesos
     ---
     
    -Spark can run on clusters managed by [Apache 
Mesos](http://mesos.apache.org/). Follow the steps below to install Mesos and 
Spark:
    -
    -1. Download and build Spark using the instructions [here](index.html). 
**Note:** Don't forget to consider what version of HDFS you might want to use!
    -2. Download, build, install, and start Mesos {{site.MESOS_VERSION}} on 
your cluster. You can download the Mesos distribution from a 
[mirror](http://www.apache.org/dyn/closer.cgi/mesos/{{site.MESOS_VERSION}}/). 
See the Mesos [Getting Started](http://mesos.apache.org/gettingstarted) page 
for more information. **Note:** If you want to run Mesos without installing it 
into the default paths on your system (e.g., if you don't have administrative 
privileges to install it), you should also pass the `--prefix` option to 
`configure` to tell it where to install. For example, pass 
`--prefix=/home/user/mesos`. By default the prefix is `/usr/local`.
    -3. Create a Spark "distribution" using `make-distribution.sh`.
    -4. Rename the `dist` directory created from `make-distribution.sh` to 
`spark-{{site.SPARK_VERSION}}`.
    -5. Create a `tar` archive: `tar czf spark-{{site.SPARK_VERSION}}.tar.gz 
spark-{{site.SPARK_VERSION}}`
    -6. Upload this archive to HDFS or another place accessible from Mesos via 
`http://`, e.g., [Amazon Simple Storage Service](http://aws.amazon.com/s3): 
`hadoop fs -put spark-{{site.SPARK_VERSION}}.tar.gz 
/path/to/spark-{{site.SPARK_VERSION}}.tar.gz`
    -7. Create a file called `spark-env.sh` in Spark's `conf` directory, by 
copying `conf/spark-env.sh.template`, and add the following lines to it:
    -   * `export MESOS_NATIVE_LIBRARY=<path to libmesos.so>`. This path is 
usually `<prefix>/lib/libmesos.so` (where the prefix is `/usr/local` by 
default, see above). Also, on Mac OS X, the library is called `libmesos.dylib` 
instead of `libmesos.so`.
    +# Why Mesos
    +
    +Spark can run on hardware clusters managed by [Apache 
Mesos](http://mesos.apache.org/).
    +
    +The advantages of deploying Spark with Mesos include:
    +- dynamic partitioning between Spark and other
    +  
[frameworks](https://mesos.apache.org/documentation/latest/mesos-frameworks/)
    +- scalable partitioning between multiple instances of Spark
    +
    +To get started, follow the steps below to install Mesos and deploy Spark 
jobs via Mesos.
    +
    +
    +# Installing Mesos
    +
    +Spark {{site.SPARK_VERSION}} is designed for use with Mesos 
{{site.MESOS_VERSION}} and does not
    +require any special patches of Mesos.
    +
    +If you already have a Mesos cluster running, you can skip this Mesos 
installation step.
    +
    +Otherwise, installing Mesos for Spark is no different than installing 
Mesos for use by other
    +frameworks.  You can install Mesos using either prebuilt packages or by 
compiling from source.
    +
    +## Prebuilt packages
    +
    +The Apache Mesos project only publishes source package releases, no binary 
releases.  But other
    +third party projects publish binary releases that may be helpful in 
setting Mesos up.
    +
    +One of those is Mesosphere.  To install Mesos using the binary releases 
provided by Mesosphere:
    +
    +1. Download Mesos installation package from [downloads 
page](http://mesosphere.io/downloads/)
    +2. Follow their instructions for installation and configuration
    +
    +The Mesosphere installation documents suggest setting up Zookeeper to 
handle Mesos master failover,
    +but Mesos can be run without Zookeeper using a single master as well.
    +
    +## From source
    +
    +To install Mesos directly from the upstream project rather than a third 
party, install from source.
    +
    +1. Download the Mesos distribution from a 
[mirror](http://www.apache.org/dyn/closer.cgi/mesos/{{site.MESOS_VERSION}}/)
    +2. Follow the Mesos [Getting 
Started](http://mesos.apache.org/gettingstarted) page for compiling and 
installing Mesos
    +
    +**Note:** If you want to run Mesos without installing it into the default 
paths on your system
    +(e.g., if you lack administrative privileges to install it), you should 
also pass the
    +`--prefix` option to `configure` to tell it where to install. For example, 
pass
    +`--prefix=/home/user/mesos`. By default the prefix is `/usr/local`.
    +
    +## Verification
    +
    +To verify that the Mesos cluster is ready for Spark, navigate to the Mesos 
master webui at port
    +:5050  Confirm that all expected machines are present in the slaves tab 
    +
    +
    +# Connecting Spark to Mesos
    +
    +To use Mesos from Spark, you need a Spark distribution available in a 
place accessible by Mesos, and
    +a Spark driver program configured to connect to Mesos.
    +
    +## Uploading Spark Distribution
    +
    +When Mesos runs a task on a Mesos slave for the first time, that slave 
must have a distribution of
    +Spark available for running the Spark Mesos executor backend.  A 
distribution of Spark is just a
    +compiled binary version of Spark.
    +
    +The Spark distribution can be hosted at any Hadoop URI, including HTTP via 
`http://`, [Amazon Simple
    +Storage Service](http://aws.amazon.com/s3) via `s3://`, or HDFS via 
`hdfs:///`.
    +
    +To use a precompiled distribution:
    +
    +1. Download a Spark distribution from the Spark [download 
page](https://spark.apache.org/downloads.html)  
    +2. Upload to hdfs/http/s3
    +
    +To host on HDFS, use the Hadoop fs put command: `hadoop fs -put 
spark-{{site.SPARK_VERSION}}.tar.gz
    +/path/to/spark-{{site.SPARK_VERSION}}.tar.gz`
    +
    +
    +Or if you are using a custom-compiled version of Spark, you will need to 
create a distribution using
    +the `make-distribution.sh` script included in a Spark source 
tarball/checkout.
    +
    +1. Download and build Spark using the instructions [here](index.html)
    +2. Create a Spark distribution using `make-distribution.sh`.
    +3. Rename the `dist` directory created from `make-distribution.sh` to 
`spark-{{site.SPARK_VERSION}}`.
    +4. Create a `tar` archive: `tar czf spark-{{site.SPARK_VERSION}}.tar.gz 
spark-{{site.SPARK_VERSION}}`
    +5. Upload archive to http/s3/hdfs
    +
    +
    +## Using a Mesos Master URL
    +
    +1. Edit `spark-env.sh` in the Spark `conf` directory and add the following 
lines:
    +   * `export MESOS_NATIVE_LIBRARY=<path to libmesos.so>`. This path is 
typically
    +     `<prefix>/lib/libmesos.so` where the prefix is `/usr/local` by 
default. See Mesos installation
    +     instructions above. Also, on Mac OS X, the library is called 
`libmesos.dylib` instead of
    +     `libmesos.so`.
        * `export SPARK_EXECUTOR_URI=<path to 
spark-{{site.SPARK_VERSION}}.tar.gz uploaded above>`.
    -   * `export MASTER=mesos://HOST:PORT` where HOST:PORT is the host and 
port (default: 5050) of your Mesos master (or `zk://...` if using Mesos with 
ZooKeeper).
    -8. To run a Spark application against the cluster, when you create your 
`SparkContext`, pass the string `mesos://HOST:PORT` as the master URL. In 
addition, you'll need to set the `spark.executor.uri` property. For example:
    +   * `export MASTER=mesos://HOST:PORT` where HOST:PORT is the host and 
port (default: 5050) of your
    --- End diff --
    
    When using the spark shell or spark-submit, users can now just add the 
`--master` flag. E.g.:
    
    ```
       ./bin/spark-shell --master mesos://host:port
    ```
    
    Since this is the preferred way, it might be good to suggest it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1818 Freshen Mesos documentation

Reply via email to