Re: specifying worker nodes when using the repl?

Sandy Ryza Mon, 19 May 2014 10:14:44 -0700

Hi Eric,

Have you tried setting the SPARK_WORKER_INSTANCES env variable before
running spark-shell?
http://spark.apache.org/docs/0.9.0/running-on-yarn.html


-Sandy


On Mon, May 19, 2014 at 8:08 AM, Eric Friedman <e...@spottedsnake.net>wrote:

> Hi
>
> I am working with a Cloudera 5 cluster with 192 nodes and can’t work out
> how to get the spark repo to use more than 2 nodes in an interactive
> session.
>
> So, this works, but is non-interactive (using yarn-client as MASTER)
>
> /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/bin/spark-class
> \
>   org.apache.spark.deploy.yarn.Client \
>   --jar
> /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/examples/lib/spark-examples_2.10-0.9.0-cdh5.0.0.jar
> \
>   --class org.apache.spark.examples.SparkPi \
>   --args yarn-standalone \
>   --args 10 \
>   *--num-workers 100*
>
> There does not appear to be an (obvious?) way to get more than 2 nodes
> involved from the repl.
>
> I am running the REPL like this:
>
> #!/bin/sh
>
> . /etc/spark/conf.cloudera.spark/spark-env.sh
>
> export SPARK_JAR=
> hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar
>
> export SPARK_WORKER_MEMORY=512m
>
> export MASTER=yarn-client
>
> exec $SPARK_HOME/bin/spark-shell
>
> Now if I comment out the line with `export SPARK_JAR=…’ and run this
> again, I get an error like this:
>
> 14/05/19 08:03:41 ERROR Client: Error: You must set SPARK_JAR environment
> variable!
> Usage: org.apache.spark.deploy.yarn.Client [options]
> Options:
>   --jar JAR_PATH             Path to your application's JAR file (required
> in yarn-cluster mode)
>   --class CLASS_NAME         Name of your application's main class
> (required)
>   --args ARGS                Arguments to be passed to your application's
> main class.
>                              Mutliple invocations are possible, each will
> be passed in order.
>   --num-workers NUM          Number of workers to start (Default: 2)
>   […]
>
> But none of those options are exposed at the `spark-shell’ level.
>
> Thanks in advance for your guidance.
>
> Eric
>

Re: specifying worker nodes when using the repl?

Reply via email to