Re: master attempted to re-register the worker and then took all workers as unregistered

2014-01-15 Thread Nan Zhu
I got the reason for the weird behaviour  

the executor throws an exception due to the bug in application code (I forgot 
to set an env variable used in the application code in every machine) when 
starting  

then the master seems to remove the worker from the list (?) but the worker 
keeps sending the heartbeat but gets no reply, finally all workers are dead…

but obviously it should not work in this way, the problematic application code 
should not make all workers dead

I’m checking the source code to find the reason

Best,

--  
Nan Zhu


On Tuesday, January 14, 2014 at 8:53 PM, Nan Zhu wrote:

 Hi, all  
  
 I’m trying to deploy spark in standalone mode, everything goes as usual,  
  
 the webUI is accessible, the master node wrote some logs saying all workers 
 are registered
  
 14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started  
 14/01/15 01:37:31 INFO ActorSystemImpl: 
 RemoteServerStarted@akka://sparkMaster@172.31.36.93 
 (mailto:sparkMaster@172.31.36.93):7077
 14/01/15 01:37:31 INFO Master: Starting Spark master at 
 spark://172.31.36.93:7077
 14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at 
 http://ip-172-31-36-93.us-west-2.compute.internal:8080
 14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE
 14/01/15 01:37:34 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal
  (mailto:sparkwor...@ip-172-31-34-61.us-west-2.compute.internal):37914
 14/01/15 01:37:34 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal
  (mailto:sparkwor...@ip-172-31-40-28.us-west-2.compute.internal):43055
 14/01/15 01:37:34 INFO Master: Registering worker 
 ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
 14/01/15 01:37:34 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sparkwor...@ip-172-31-45-211.us-west-2.compute.internal
  (mailto:sparkwor...@ip-172-31-45-211.us-west-2.compute.internal):55355
 14/01/15 01:37:34 INFO Master: Registering worker 
 ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
 14/01/15 01:37:34 INFO Master: Registering worker 
 ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
 14/01/15 01:37:34 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sparkwor...@ip-172-31-41-251.us-west-2.compute.internal
  (mailto:sparkwor...@ip-172-31-41-251.us-west-2.compute.internal):47709
 14/01/15 01:37:34 INFO Master: Registering worker 
 ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
 14/01/15 01:37:34 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sparkwor...@ip-172-31-43-78.us-west-2.compute.internal
  (mailto:sparkwor...@ip-172-31-43-78.us-west-2.compute.internal):36257
 14/01/15 01:37:34 INFO Master: Registering worker 
 ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
 14/01/15 01:38:44 INFO ActorSystemImpl: 
 RemoteClientStarted@akka://sp...@ip-172-31-37-160.us-west-2.compute.internal 
 (mailto:sp...@ip-172-31-37-160.us-west-2.compute.internal):43086
  
  
  
  
 However, when I launched an application, the master firstly “attempted to 
 re-register the worker” and then said that all heartbeats are from 
 “unregistered” workers. Can anyone told me what happened here?
  
 14/01/15 01:38:44 INFO Master: Registering app ALS  
 14/01/15 01:38:44 INFO Master: Registered app ALS with ID 
 app-20140115013844-
 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/0 
 on worker 
 worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257
 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/1 
 on worker 
 worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055
 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/2 
 on worker 
 worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914
 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/3 
 on worker 
 worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355
 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/4 
 on worker 
 worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709
 14/01/15 01:38:44 INFO Master: Registering worker 
 ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same 
 address: akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal 
 (mailto:sparkwor...@ip-172-31-40-28.us-west-2.compute.internal):43055
 14/01/15 01:38:44 INFO Master: Registering worker 
 ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same 
 address: akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal 
 (mailto:sparkwor...@ip-172-31-34-61.us-west-2.compute.internal):37914
 14/01/15 01:38:44 INFO Master: 

Anyone know hot to submit spark job to yarn in java code?

2014-01-15 Thread John Zhao
Now I am working on a web application and  I want to  submit a spark job to 
hadoop yarn.
I have already do my own assemble and  can run it in command line by the 
following script:

export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export 
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar 
./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  
--class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 
3 --master-memory 1g --worker-memory 512m --worker-cores 1

It works fine.
The I realized that it is hard to submit the job from a web application .Looks 
like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it 
contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server 
? 
2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar 
plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar 
goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and 
want follow the same logic to submit spark job. For now I can only find the 
command line way to submit spark job to yarn. I believe there is a easy way to 
integration spark in a web allocation.  


Thanks.
John.

Re: Anyone know hot to submit spark job to yarn in java code?

2014-01-15 Thread Philip Ogren
Great question!  I was writing up a similar question this morning and 
decided to investigate some more before sending.  Here's what I'm 
trying.  I have created a new scala project that contains only 
spark-examples-assembly-0.8.1-incubating.jar and 
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
classpath and I am trying to create a yarn-client SparkContext with the 
following:


val spark = new SparkContext(yarn-client, my-app)

My hope is to run this on my laptop and have it execute/connect on the 
yarn application master.  The hope is that if I can get this to work, 
then I can do the same from a web application.  I'm trying to unpack 
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out 
what environment variables I need to set up etc.


I grabbed all the .xml files out of my clusters conf directory (in my 
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put 
them on my classpath.  I also set up environment variables SPARK_JAR, 
SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.


When I run my simple scala script, I get the following error:

Exception in thread main org.apache.spark.SparkException: Yarn 
application already ended,might be killed or not able to launch 
application master.
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
at 
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)

at org.apache.spark.SparkContext.init(SparkContext.scala:273)
at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)

I can look at my yarn UI and see that it registers a failed application, 
so I take this as incremental progress.  However, I'm not sure how to 
troubleshoot what I'm doing from here or if what I'm trying to do is 
even sensible/possible.  Any advice is appreciated.


Thanks,
Philip

On 1/15/2014 11:25 AM, John Zhao wrote:

Now I am working on a web application and  I want to  submit a spark job to 
hadoop yarn.
I have already do my own assemble and  can run it in command line by the 
following script:

export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export 
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar 
./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  
--class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 
3 --master-memory 1g --worker-memory 512m --worker-cores 1

It works fine.
The I realized that it is hard to submit the job from a web application .Looks 
like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it 
contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server ?
2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar 
plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar 
goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and 
want follow the same logic to submit spark job. For now I can only find the 
command line way to submit spark job to yarn. I believe there is a easy way to 
integration spark in a web allocation.


Thanks.
John.




Re: Please help: change $SPARK_HOME/work directory for spark applications

2014-01-15 Thread Nan Zhu
Hi, Jin   

It’s SPARK_WORKER_DIR

Line 48 WorkerArguments.scala

if (System.getenv(SPARK_WORKER_DIR) != null) {
workDir = System.getenv(SPARK_WORKER_DIR)
  }


Best,  

--  
Nan Zhu


On Wednesday, January 15, 2014 at 2:03 PM, Chen Jin wrote:

 Hi,
  
 Currently my application jars and logs are stored in $SPARK_HOME/work,
 I would like to change it to somewhere having more space. Could anyone
 advise me on this? Changing the log dir is straightforward which just
 to export SPARK_LOG_DIR, however, there is no environment variable for
 SPARK_WORK_DIR.
  
 Thanks a lot,
  
 -chen  



Please help: change $SPARK_HOME/work directory for spark applications

2014-01-15 Thread Chen Jin
Hi,

Currently my application jars and logs are stored in $SPARK_HOME/work,
I would like to change it to somewhere having more space. Could anyone
advise me on this? Changing the log dir is straightforward which just
to export SPARK_LOG_DIR, however, there is no environment variable for
SPARK_WORK_DIR.

Thanks a lot,

-chen


Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)

2014-01-15 Thread Soren Macbeth
Howdy,

I'm having some trouble understanding what this exception means, i.e., what
the problem it's complaining about is. The full stack trace is here:

https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4

I've doing a simple map and then reduce.

TIA


libraryDependencies configuration is different for sbt assembly vs sbt run

2014-01-15 Thread kamatsuoka
When I run sbt assembly, I use the provided configuration in the
build.sbt library dependency, to avoid conflicts in the fat jar: 

libraryDependencies += org.apache.spark %% spark-core %
0.8.1-incubating % provided

But if I want to do sbt run, I have to remove the provided, otherwise it
doesn't find the Spark classes.

Is there a way to set up my build.sbt so that it does the right thing in
both cases, without monkeying with my build.sbt each time?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/libraryDependencies-configuration-is-different-for-sbt-assembly-vs-sbt-run-tp565.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)

2014-01-15 Thread Soren Macbeth
0.8.1-incubating running locally.

On January 15, 2014 at 2:28:00 PM, Mark Hamstra (m...@clearstorydata.com) wrote:

Spark version?



On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.com wrote:
Howdy,

I'm having some trouble understanding what this exception means, i.e., what the 
problem it's complaining about is. The full stack trace is here:

https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4

I've doing a simple map and then reduce.

TIA



Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)

2014-01-15 Thread Mark Hamstra
Okay, that fits with what I was expecting.

What does your reduce function look like?


On Wed, Jan 15, 2014 at 2:33 PM, Soren Macbeth so...@yieldbot.com wrote:

 0.8.1-incubating running locally.

 On January 15, 2014 at 2:28:00 PM, Mark Hamstra 
 (m...@clearstorydata.com//m...@clearstorydata.com)
 wrote:

 Spark version?



 On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.com wrote:

 Howdy,

 I'm having some trouble understanding what this exception means, i.e.,
 what the problem it's complaining about is. The full stack trace is here:

 https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4

 I've doing a simple map and then reduce.

 TIA





Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)

2014-01-15 Thread Soren Macbeth
I'm working on a Clojure DSL, so my map and reduce function are in Clojure,
but I updated to the gist to include the code.

https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4

(map-reduce-1) works as expected, however, (map-reduce) throws that
exception. I've traced the types and outputs along the way and every is
identical form what I can tell. (defsparkfn) uses (sparkop) under the hood
as well, so that code is essentially identical, which has my scratching my
head.


On Wed, Jan 15, 2014 at 2:56 PM, Mark Hamstra m...@clearstorydata.comwrote:

 Okay, that fits with what I was expecting.

 What does your reduce function look like?


 On Wed, Jan 15, 2014 at 2:33 PM, Soren Macbeth so...@yieldbot.com wrote:

 0.8.1-incubating running locally.

 On January 15, 2014 at 2:28:00 PM, Mark Hamstra 
 (m...@clearstorydata.com//m...@clearstorydata.com)
 wrote:

 Spark version?



 On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.comwrote:

 Howdy,

 I'm having some trouble understanding what this exception means, i.e.,
 what the problem it's complaining about is. The full stack trace is here:

 https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4

 I've doing a simple map and then reduce.

 TIA






Re: Anyone know hot to submit spark job to yarn in java code?

2014-01-15 Thread Philip Ogren

My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052

So, I will try running my setup from a Linux client and see if I have 
better luck.


On 1/15/2014 11:38 AM, Philip Ogren wrote:
Great question!  I was writing up a similar question this morning and 
decided to investigate some more before sending.  Here's what I'm 
trying.  I have created a new scala project that contains only 
spark-examples-assembly-0.8.1-incubating.jar and 
spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
classpath and I am trying to create a yarn-client SparkContext with 
the following:


val spark = new SparkContext(yarn-client, my-app)

My hope is to run this on my laptop and have it execute/connect on the 
yarn application master.  The hope is that if I can get this to work, 
then I can do the same from a web application.  I'm trying to unpack 
run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure 
out what environment variables I need to set up etc.


I grabbed all the .xml files out of my clusters conf directory (in my 
case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and 
put them on my classpath.  I also set up environment variables 
SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.


When I run my simple scala script, I get the following error:

Exception in thread main org.apache.spark.SparkException: Yarn 
application already ended,might be killed or not able to launch 
application master.
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
at 
org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)

at org.apache.spark.SparkContext.init(SparkContext.scala:273)
at 
SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)

at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)

I can look at my yarn UI and see that it registers a failed 
application, so I take this as incremental progress.  However, I'm not 
sure how to troubleshoot what I'm doing from here or if what I'm 
trying to do is even sensible/possible.  Any advice is appreciated.


Thanks,
Philip

On 1/15/2014 11:25 AM, John Zhao wrote:
Now I am working on a web application and I want to  submit a spark 
job to hadoop yarn.
I have already do my own assemble and  can run it in command line by 
the following script:


export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export 
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar 
./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar 
--class org.apache.spark.examples.SparkPi --args yarn-standalone 
--num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1


It works fine.
The I realized that it is hard to submit the job from a web 
application .Looks like the 
spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I 
believe it contains everything .

So my question is :
1) when I run the above script, which jar is beed submitted to the 
yarn server ?
2) It loos like the 
spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role 
of client side and spark-examples-assembly-0.8.1-incubating.jar goes 
with spark runtime and examples which will be running in yarn, am I 
right?
3) Does anyone have any similar experience ? I did lots of hadoop MR 
stuff and want follow the same logic to submit spark job. For now I 
can only find the command line way to submit spark job to yarn. I 
believe there is a easy way to integration spark in a web allocation.



Thanks.
John.






Reading files on a cluster / shared file system

2014-01-15 Thread Ognen Duzlevski
On a cluster where the nodes and the master all have access to a shared
filesystem/files - does spark read a file (like one resulting from
sc.textFile()) in parallel/different sections on each node? Or is the file
read on master in sequence and chunks processed on the nodes afterwards?

Thanks!
Ognen


jarOfClass method no found in SparkContext

2014-01-15 Thread arjun biswas
Hello All ,

I have installed spark on my machine and was succesful in running sbt/sbt
package as well as sbt/sbt assembly . I am trying to run the examples in
java from eclipse . To be precise i am trying to run the JavaLogQuery
example from eclipse . The issue is i am unable to resolve this compilation
problem of *jarOfClass being not available inside the Java Spark Context* .
I have added all the possible jars and is using Spark 0.8.1 incubating
which is the latest one with scala 2.9.3 .I have all jars to the classpath
to the point that i do not get any import error . However
JavaSparkContext.jarOfClass gives the above error saying the jarOfClass
method is unavailable in the JavaSparkContext . I am using Spark-0.8.1
incubating and scala 2.9.3 . Has anyone tried to run the java sample
examples from eclipse . Please note that this is a compile time error in
eclipse .

Regards
Arjun


Re: jarOfClass method no found in SparkContext

2014-01-15 Thread Tathagata Das
Could it be possible that you have an older version of JavaSparkContext
(i.e. from an older version of Spark) in your path? Please check that there
aren't two versions of Spark accidentally included in your class path used
in Eclipse. It would not give errors in the import (as it finds the
imported packages and classes) but would give such errors as it may be
unfortunately finding an older version of JavaSparkContext class in the
class path.

TD


On Wed, Jan 15, 2014 at 4:14 PM, arjun biswas arjunbiswas@gmail.comwrote:

 Hello All ,

 I have installed spark on my machine and was succesful in running sbt/sbt
 package as well as sbt/sbt assembly . I am trying to run the examples in
 java from eclipse . To be precise i am trying to run the JavaLogQuery
 example from eclipse . The issue is i am unable to resolve this
 compilation problem of *jarOfClass being not available inside the Java
 Spark Context* . I have added all the possible jars and is using Spark
 0.8.1 incubating which is the latest one with scala 2.9.3 .I have all
 jars to the classpath to the point that i do not get any import error .
 However JavaSparkContext.jarOfClass gives the above error saying the
 jarOfClass method is unavailable in the JavaSparkContext . I am using
 Spark-0.8.1 incubating and scala 2.9.3 . Has anyone tried to run the java
 sample examples from eclipse . Please note that this is a compile time
 error in eclipse .

 Regards
 Arjun



RE: Anyone know hot to submit spark job to yarn in java code?

2014-01-15 Thread Liu, Raymond
Hi

Regarding your question

1) when I run the above script, which jar is beed submitted to the yarn server 
? 

What SPARK_JAR env point to and the --jar point to are both submitted to the 
yarn server

2) It like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the 
role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with 
spark runtime and examples which will be running in yarn, am I right?

The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to yarn 
cluster as runtime for app jar(spark-examples-assembly-0.8.1-incubating.jar)

3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and 
want follow the same logic to submit spark job. For now I can only find the 
command line way to submit spark job to yarn. I believe there is a easy way to 
integration spark in a web allocation.

You can use the yarn-client mode, you might want to take a look on 
docs/running-on-yarn.md, and probably you might want to try master branch to 
check our latest update on this part of docs. And in yarn client mode, the 
sparkcontext itself will do similar thing as what the command line is doing to 
submit a yarn job

Then to use it with java, you might want to try out JavaSparkContext instead of 
SparkContext, I don't personally run it with complicated applications. But a 
small example app did works.


Best Regards,
Raymond Liu

-Original Message-
From: John Zhao [mailto:jz...@alpinenow.com] 
Sent: Thursday, January 16, 2014 2:25 AM
To: user@spark.incubator.apache.org
Subject: Anyone know hot to submit spark job to yarn in java code?

Now I am working on a web application and  I want to  submit a spark job to 
hadoop yarn.
I have already do my own assemble and  can run it in command line by the 
following script:

export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
export 
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
./spark-class org.apache.spark.deploy.yarn.Client  --jar 
./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar  
--class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 
3 --master-memory 1g --worker-memory 512m --worker-cores 1

It works fine.
The I realized that it is hard to submit the job from a web application .Looks 
like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it 
contains everything .
So my question is :
1) when I run the above script, which jar is beed submitted to the yarn server 
? 
2) It loos like the  spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar 
plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar 
goes with spark runtime and examples which will be running in yarn, am I right?
3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and 
want follow the same logic to submit spark job. For now I can only find the 
command line way to submit spark job to yarn. I believe there is a easy way to 
integration spark in a web allocation.  


Thanks.
John.


Re: Reading files on a cluster / shared file system

2014-01-15 Thread Tathagata Das
If you are running a distributed Spark cluster over the nodes, then the
reading should be done in a distributed manner. If you give sc.textFile() a
local path to a directory in the shared file system, then each worker
should read a subset of the files in directory by accessing them locally.
Nothing should be read on the master.

TD


On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski
og...@nengoiksvelzud.comwrote:

 On a cluster where the nodes and the master all have access to a shared
 filesystem/files - does spark read a file (like one resulting from
 sc.textFile()) in parallel/different sections on each node? Or is the file
 read on master in sequence and chunks processed on the nodes afterwards?

 Thanks!
 Ognen



Re: Master and worker nodes in standalone deployment

2014-01-15 Thread Nan Zhu
you can start a worker process in the master node

so that all nodes in your cluster can participate in the computation

Best, 

-- 
Nan Zhu


On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote:

 When spark is deployed on cluster in standalone deployment mode (V 0.81), one 
 of the node is started as master and others as workers.
 
 What does the master node does ? Can it participates in actual computations 
 or does it just acts as coordinator ? 
 
 Thanks,
 
 Manoj 



Re: Master and worker nodes in standalone deployment

2014-01-15 Thread Manoj Samel
Thanks,

Could you still explain what does master process does ?


On Wed, Jan 15, 2014 at 8:36 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 you can start a worker process in the master node

 so that all nodes in your cluster can participate in the computation

 Best,

 --
 Nan Zhu

 On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote:

 When spark is deployed on cluster in standalone deployment mode (V 0.81),
 one of the node is started as master and others as workers.

 What does the master node does ? Can it participates in actual
 computations or does it just acts as coordinator ?

 Thanks,

 Manoj





Re: Master and worker nodes in standalone deployment

2014-01-15 Thread Nan Zhu
it maintains the running of worker process, create executor for the tasks in 
the worker nodes, contacts with driver program, etc.

-- 
Nan Zhu


On Wednesday, January 15, 2014 at 11:37 PM, Manoj Samel wrote:

 Thanks,
 
 Could you still explain what does master process does ?
 
 
 On Wed, Jan 15, 2014 at 8:36 PM, Nan Zhu zhunanmcg...@gmail.com 
 (mailto:zhunanmcg...@gmail.com) wrote:
  you can start a worker process in the master node
  
  so that all nodes in your cluster can participate in the computation
  
  Best, 
  
  -- 
  Nan Zhu
  
  
  On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote:
  
   When spark is deployed on cluster in standalone deployment mode (V 0.81), 
   one of the node is started as master and others as workers.
   
   What does the master node does ? Can it participates in actual 
   computations or does it just acts as coordinator ? 
   
   Thanks,
   
   Manoj 
  
 



Re: jarOfClass method no found in SparkContext

2014-01-15 Thread arjun biswas
Thanks for pointing me to that mistake . Yes i was using the spark 0.8.1
incubating jar and the master branch code examples . I corrected the mistake

Regards


On Wed, Jan 15, 2014 at 5:51 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hm, are you sure you haven't included the master branch of Spark
 somehow in your classpath? jarOfClass was added to Java in the master
 branch and Spark 0.9.0 (RC). So it seems a lot like you have a newer
 (post 0.8.X) version of the examples.

 - Patrick

 On Wed, Jan 15, 2014 at 5:04 PM, arjun biswas arjunbiswas@gmail.com
 wrote:
  Could it be possible that you have an older version of JavaSparkContext
  (i.e. from an older version of Spark) in your path? Please check that
 there
  aren't two versions of Spark accidentally included in your class path
 used
  in Eclipse. It would not give errors in the import (as it finds the
 imported
  packages and classes) but would give such errors as it may be
 unfortunately
  finding an older version of JavaSparkContext class in the class path.
 
 
 
  I have the following three jars in the class path of eclipse .and no
 other
  jar is currently in the classpath
  1)google-collections-0.8.jar
  2)scala-library.jar
  3)spark-core_2.9.3-0.8.1-incubating.jar
 
  Am i using the correct jar files to run the java samples from eclipse ?
 
  Regards
 
 
 
 
  On Wed, Jan 15, 2014 at 4:36 PM, Tathagata Das 
 tathagata.das1...@gmail.com
  wrote:
 
  Could it be possible that you have an older version of JavaSparkContext
  (i.e. from an older version of Spark) in your path? Please check that
 there
  aren't two versions of Spark accidentally included in your class path
 used
  in Eclipse. It would not give errors in the import (as it finds the
 imported
  packages and classes) but would give such errors as it may be
 unfortunately
  finding an older version of JavaSparkContext class in the class path.
 
  TD
 
 
  On Wed, Jan 15, 2014 at 4:14 PM, arjun biswas 
 arjunbiswas@gmail.com
  wrote:
 
  Hello All ,
 
  I have installed spark on my machine and was succesful in running
 sbt/sbt
  package as well as sbt/sbt assembly . I am trying to run the examples
 in
  java from eclipse . To be precise i am trying to run the JavaLogQuery
  example from eclipse . The issue is i am unable to resolve this
 compilation
  problem of jarOfClass being not available inside the Java Spark
 Context . I
  have added all the possible jars and is using Spark 0.8.1 incubating
 which
  is the latest one with scala 2.9.3 .I have all jars to the classpath
 to the
  point that i do not get any import error . However
  JavaSparkContext.jarOfClass gives the above error saying the jarOfClass
  method is unavailable in the JavaSparkContext . I am using Spark-0.8.1
  incubating and scala 2.9.3 . Has anyone tried to run the java sample
  examples from eclipse . Please note that this is a compile time error
 in
  eclipse .
 
  Regards
  Arjun