Re: How to submit a job to Spark cluster?

Nan Zhu Thu, 20 Feb 2014 17:08:01 -0800

I think it is a confusing place of current web UI, even your standalone app is 
finished without any error, the status is still KILLED


in spark, in most cases, you don’t need to rely on script to submit jobs, you 
only need to specify the master address when construct a SparkContext object,  

but if you want to submit a in-cluster driver, you will need bin/spark-class, 
http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Best,  

--  
Nan Zhu


On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:

> In a Hadoop cluster, the following command is the general way to submit a job:
>        bin/hadoop jar <job-jar> <arguments>
>  
>  
> Is there such a general way to submit a job into Spark cluster?   
>  
> Besides, my job finished successfully, and the Spark Web UI shows that this 
> application's state is FINISHED, but each executor's state is KILLED. I could 
> see this application has produced the expected result, why is each executor's 
> state reported as KILLED ?   
>  
> Completed Applications  
> ID
> Name
> Cores
> Memory per Node
> Submitted Time
> User
> State
> Duration
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> app-20140220173957-0001 
> (http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001)  
> **SimpleDistributedApp** (http://hadoop-1.certus.com:4040/)  
> 12  
> 1024.0 MB  
> 2014/02/20 17:39:57
> root
> FINISHED
> 13 s
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> Executor Summary  
>  
> ExecutorID
> Worker
> Cores
> Memory
> State
> Logs
>  
>  
> 2
> worker-20140220162542-hadoop-2.certus.com-49805 
> (http://hadoop-2.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout 
> (http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout)
>  stderr 
> (http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr)
>   
>  
> 1
> worker-20140220162542-hadoop-4.certus.com-40528 
> (http://hadoop-4.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout 
> (http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout)
>  stderr 
> (http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr)
>   
>  
> 0
> worker-20140220162542-hadoop-3.certus.com-47386 
> (http://hadoop-3.certus.com:8081/)  
> 4
> 1024
> KILLED
> stdout 
> (http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout)
>  stderr 
> (http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr)
>  
>  
>  
>  
>  
>  
> Thanks
> Tao
>  
>  
>  
> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <[email protected] 
> (mailto:[email protected])>:
> > You are specifying the spark master in the jar  
> >  .setMaster("spark://hadoop-1.certus.com:7077 
> > (http://hadoop-1.certus.com:7077/)")
> > so sbt run is deploying the jar into the master cluster and running it.  
> > Regards
> > Mayur
> >  
> >  
> > Mayur Rustagi
> > Ph: +919632149971
> > h (https://twitter.com/mayur_rustagi)ttp://www.sigmoidanalytics.com 
> > (http://www.sigmoidanalytics.com)
> > https://twitter.com/mayur_rustagi
> >  
> >  
> >  
> > On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <[email protected] 
> > (mailto:[email protected])> wrote:
> > > I’m not sure if I understand your question correctly  
> > >  
> > > do you mean you didn’t see the application information in Spark Web UI 
> > > even it generates the expected results?
> > >  
> > > Best,  
> > >  
> > > --  
> > > Nan Zhu
> > >  
> > >  
> > > On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
> > >  
> > > > My application source file,  SimpleDistributedApp.scala, is as  follows:
> > > >  
> > > > __________________________________________________________________  
> > > > import org.apache.spark.{SparkConf, SparkContext}
> > > >  
> > > > object SimpleDistributedApp {
> > > >     def main(args: Array[String]) = {
> > > >         val filepath = 
> > > > "hdfs://hadoop-1.certus.com:54310/user/root/samples/data 
> > > > (http://hadoop-1.certus.com:54310/user/root/samples/data)"
> > > >  
> > > >         val conf = new SparkConf()
> > > >                     .setMaster("spark://hadoop-1.certus.com:7077 
> > > > (http://hadoop-1.certus.com:7077)")
> > > >                     .setAppName("**SimpleDistributedApp**")
> > > >                     
> > > > .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
> > > >                     
> > > > .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
> > > >                     .set("spark.executor.memory", "1g")
> > > >  
> > > >         val sc = new SparkContext(conf)
> > > >         val text = sc.textFile(filepath, 3)
> > > >  
> > > >         val numOfHello = text.filter(line => 
> > > > line.contains("hello")).count()
> > > >  
> > > >         println("number of lines containing 'hello' is " + numOfHello)
> > > >         println("down")
> > > >     }
> > > > }
> > > >  
> > > > ______________________________________________________________________
> > > >  
> > > >  
> > > >  
> > > > The corresponding sbt file, $SPARK_HOME/simple.sbt,  is as follows:
> > > > _________________________________________________________________
> > > >  
> > > > name := "Simple Distributed App"  
> > > >  
> > > > version := "1.0"
> > > >  
> > > > scalaVersion := "2.10.3"  
> > > >  
> > > > libraryDependencies += "org.apache.spark" %% "spark-core" % 
> > > > "0.9.0-incubating"  
> > > >  
> > > > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";  
> > > > _________________________________________________________________
> > > >  
> > > >  
> > > > I built the application into 
> > > > $SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar, 
> > > > using the command   
> > > >         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
> > > >  
> > > > I ran it using the command "sbt/sbt run" and it finished running 
> > > > successfully.  
> > > >  
> > > > But I'm not sure what's the correct and general way to submit and run a 
> > > > job in Spark cluster. To be specific,after having built a job into a 
> > > > JAR file, say simpleApp.jar, where should I put it and how should I 
> > > > submit it to Spark cluster?   
> > > >   
> > > >  
> > > >  
> > > >   
> > > >  
> > > >  
> > > >  
> > > >  
> > >  
> >  
>

Re: How to submit a job to Spark cluster?

Reply via email to