Re: Submitting to yarn cluster

Ron Gonzalez Thu, 03 Apr 2014 06:47:01 -0700

Right thanks, that worked.
My goal is to programmatically submit things to the yarn cluster. The 
underlying framework we have is a set of property files that specify different 
machines for dev, qe, prod. While it's definitely possible to have different 
things deployed as the client etc/hadoop directory, I was just curious if the 
only way is to have the different things setup as environment variables or if 
there was a way to programmatically override particular configurations.
I looked at the Client.scala code and it seems like it creates a new 
Configuration object that isn't accessible from the outside so most likely the 
answer is no, which is a reasonable answer. I just have to figure out a 
different deployment model for doing the different stages of the lifecycle.


Thanks,
Ron
On Thursday, April 3, 2014 6:29 AM, Tom Graves <tgraves...@yahoo.com> wrote:
 
You should just be making sure your HADOOP_CONF_DIR env variable is correct and 
not setting yarn.resourcemanager.address in SparkConf.  For Yarn/Hadoop you 
need to point it to the configuration files for your cluster.   Generally that 
setting goes into yarn-site.xml. If just setting it doesn't work, make sure 
$HADOOP_CONF_DIR is getting put into your classpath.   I would also make sure 
HADOOP_PREFIX is being set.

Tom
On Wednesday, April 2, 2014 10:10 PM, Ron Gonzalez <zlgonza...@yahoo.com> wrote:
 
Hi,
  I have a small program but I cannot seem to make it connect to the right 
properties of the cluster.
  I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.
  If I run this scala file, I am seeing that this is never using the 
yarn.resourcemanager.address property that I set on the SparkConf instance.
  Any advice?

Thanks,
Ron

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.deploy.yarn.Client
import java.lang.System
import org.apache.spark.SparkConf


object SimpleApp {
  def main(args: Array[String]) {
    val logFile = 
"/home/rgonzalez/app/spark-0.9.0-incubating-bin-hadoop2/README.md"
    val conf = new SparkConf()
    conf.set("yarn.resourcemanager.address", "localhost:8050")
    val sc = new SparkContext("yarn-client", "Simple App", conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

Re: Submitting to yarn cluster

Reply via email to