Hi Eric, Have you tried setting the SPARK_WORKER_INSTANCES env variable before running spark-shell? http://spark.apache.org/docs/0.9.0/running-on-yarn.html
-Sandy On Mon, May 19, 2014 at 8:08 AM, Eric Friedman <e...@spottedsnake.net>wrote: > Hi > > I am working with a Cloudera 5 cluster with 192 nodes and can’t work out > how to get the spark repo to use more than 2 nodes in an interactive > session. > > So, this works, but is non-interactive (using yarn-client as MASTER) > > /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/bin/spark-class > \ > org.apache.spark.deploy.yarn.Client \ > --jar > /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/examples/lib/spark-examples_2.10-0.9.0-cdh5.0.0.jar > \ > --class org.apache.spark.examples.SparkPi \ > --args yarn-standalone \ > --args 10 \ > *--num-workers 100* > > There does not appear to be an (obvious?) way to get more than 2 nodes > involved from the repl. > > I am running the REPL like this: > > #!/bin/sh > > . /etc/spark/conf.cloudera.spark/spark-env.sh > > export SPARK_JAR= > hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar > > export SPARK_WORKER_MEMORY=512m > > export MASTER=yarn-client > > exec $SPARK_HOME/bin/spark-shell > > Now if I comment out the line with `export SPARK_JAR=…’ and run this > again, I get an error like this: > > 14/05/19 08:03:41 ERROR Client: Error: You must set SPARK_JAR environment > variable! > Usage: org.apache.spark.deploy.yarn.Client [options] > Options: > --jar JAR_PATH Path to your application's JAR file (required > in yarn-cluster mode) > --class CLASS_NAME Name of your application's main class > (required) > --args ARGS Arguments to be passed to your application's > main class. > Mutliple invocations are possible, each will > be passed in order. > --num-workers NUM Number of workers to start (Default: 2) > […] > > But none of those options are exposed at the `spark-shell’ level. > > Thanks in advance for your guidance. > > Eric >