Right thanks, that worked. My goal is to programmatically submit things to the yarn cluster. The underlying framework we have is a set of property files that specify different machines for dev, qe, prod. While it's definitely possible to have different things deployed as the client etc/hadoop directory, I was just curious if the only way is to have the different things setup as environment variables or if there was a way to programmatically override particular configurations. I looked at the Client.scala code and it seems like it creates a new Configuration object that isn't accessible from the outside so most likely the answer is no, which is a reasonable answer. I just have to figure out a different deployment model for doing the different stages of the lifecycle.
Thanks, Ron On Thursday, April 3, 2014 6:29 AM, Tom Graves <tgraves...@yahoo.com> wrote: You should just be making sure your HADOOP_CONF_DIR env variable is correct and not setting yarn.resourcemanager.address in SparkConf. For Yarn/Hadoop you need to point it to the configuration files for your cluster. Generally that setting goes into yarn-site.xml. If just setting it doesn't work, make sure $HADOOP_CONF_DIR is getting put into your classpath. I would also make sure HADOOP_PREFIX is being set. Tom On Wednesday, April 2, 2014 10:10 PM, Ron Gonzalez <zlgonza...@yahoo.com> wrote: Hi, I have a small program but I cannot seem to make it connect to the right properties of the cluster. I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly. If I run this scala file, I am seeing that this is never using the yarn.resourcemanager.address property that I set on the SparkConf instance. Any advice? Thanks, Ron import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.deploy.yarn.Client import java.lang.System import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "/home/rgonzalez/app/spark-0.9.0-incubating-bin-hadoop2/README.md" val conf = new SparkConf() conf.set("yarn.resourcemanager.address", "localhost:8050") val sc = new SparkContext("yarn-client", "Simple App", conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) } }