Re: master attempted to re-register the worker and then took all workers as unregistered
I got the reason for the weird behaviour the executor throws an exception due to the bug in application code (I forgot to set an env variable used in the application code in every machine) when starting then the master seems to remove the worker from the list (?) but the worker keeps sending the heartbeat but gets no reply, finally all workers are dead… but obviously it should not work in this way, the problematic application code should not make all workers dead I’m checking the source code to find the reason Best, -- Nan Zhu On Tuesday, January 14, 2014 at 8:53 PM, Nan Zhu wrote: Hi, all I’m trying to deploy spark in standalone mode, everything goes as usual, the webUI is accessible, the master node wrote some logs saying all workers are registered 14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started 14/01/15 01:37:31 INFO ActorSystemImpl: RemoteServerStarted@akka://sparkMaster@172.31.36.93 (mailto:sparkMaster@172.31.36.93):7077 14/01/15 01:37:31 INFO Master: Starting Spark master at spark://172.31.36.93:7077 14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at http://ip-172-31-36-93.us-west-2.compute.internal:8080 14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-34-61.us-west-2.compute.internal):37914 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-40-28.us-west-2.compute.internal):43055 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkwor...@ip-172-31-45-211.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-45-211.us-west-2.compute.internal):55355 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkwor...@ip-172-31-41-251.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-41-251.us-west-2.compute.internal):47709 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkwor...@ip-172-31-43-78.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-43-78.us-west-2.compute.internal):36257 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM 14/01/15 01:38:44 INFO ActorSystemImpl: RemoteClientStarted@akka://sp...@ip-172-31-37-160.us-west-2.compute.internal (mailto:sp...@ip-172-31-37-160.us-west-2.compute.internal):43086 However, when I launched an application, the master firstly “attempted to re-register the worker” and then said that all heartbeats are from “unregistered” workers. Can anyone told me what happened here? 14/01/15 01:38:44 INFO Master: Registering app ALS 14/01/15 01:38:44 INFO Master: Registered app ALS with ID app-20140115013844- 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/0 on worker worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/1 on worker worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/2 on worker worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/3 on worker worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/4 on worker worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-40-28.us-west-2.compute.internal):43055 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal (mailto:sparkwor...@ip-172-31-34-61.us-west-2.compute.internal):37914 14/01/15 01:38:44 INFO Master:
Anyone know hot to submit spark job to yarn in java code?
Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script: export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.
Re: Anyone know hot to submit spark job to yarn in java code?
Great question! I was writing up a similar question this morning and decided to investigate some more before sending. Here's what I'm trying. I have created a new scala project that contains only spark-examples-assembly-0.8.1-incubating.jar and spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the classpath and I am trying to create a yarn-client SparkContext with the following: val spark = new SparkContext(yarn-client, my-app) My hope is to run this on my laptop and have it execute/connect on the yarn application master. The hope is that if I can get this to work, then I can do the same from a web application. I'm trying to unpack run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out what environment variables I need to set up etc. I grabbed all the .xml files out of my clusters conf directory (in my case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put them on my classpath. I also set up environment variables SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME. When I run my simple scala script, I get the following error: Exception in thread main org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72) at org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119) at org.apache.spark.SparkContext.init(SparkContext.scala:273) at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14) at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala) I can look at my yarn UI and see that it registers a failed application, so I take this as incremental progress. However, I'm not sure how to troubleshoot what I'm doing from here or if what I'm trying to do is even sensible/possible. Any advice is appreciated. Thanks, Philip On 1/15/2014 11:25 AM, John Zhao wrote: Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script: export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.
Re: Please help: change $SPARK_HOME/work directory for spark applications
Hi, Jin It’s SPARK_WORKER_DIR Line 48 WorkerArguments.scala if (System.getenv(SPARK_WORKER_DIR) != null) { workDir = System.getenv(SPARK_WORKER_DIR) } Best, -- Nan Zhu On Wednesday, January 15, 2014 at 2:03 PM, Chen Jin wrote: Hi, Currently my application jars and logs are stored in $SPARK_HOME/work, I would like to change it to somewhere having more space. Could anyone advise me on this? Changing the log dir is straightforward which just to export SPARK_LOG_DIR, however, there is no environment variable for SPARK_WORK_DIR. Thanks a lot, -chen
Please help: change $SPARK_HOME/work directory for spark applications
Hi, Currently my application jars and logs are stored in $SPARK_HOME/work, I would like to change it to somewhere having more space. Could anyone advise me on this? Changing the log dir is straightforward which just to export SPARK_LOG_DIR, however, there is no environment variable for SPARK_WORK_DIR. Thanks a lot, -chen
Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)
Howdy, I'm having some trouble understanding what this exception means, i.e., what the problem it's complaining about is. The full stack trace is here: https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4 I've doing a simple map and then reduce. TIA
libraryDependencies configuration is different for sbt assembly vs sbt run
When I run sbt assembly, I use the provided configuration in the build.sbt library dependency, to avoid conflicts in the fat jar: libraryDependencies += org.apache.spark %% spark-core % 0.8.1-incubating % provided But if I want to do sbt run, I have to remove the provided, otherwise it doesn't find the Spark classes. Is there a way to set up my build.sbt so that it does the right thing in both cases, without monkeying with my build.sbt each time? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/libraryDependencies-configuration-is-different-for-sbt-assembly-vs-sbt-run-tp565.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)
0.8.1-incubating running locally. On January 15, 2014 at 2:28:00 PM, Mark Hamstra (m...@clearstorydata.com) wrote: Spark version? On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.com wrote: Howdy, I'm having some trouble understanding what this exception means, i.e., what the problem it's complaining about is. The full stack trace is here: https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4 I've doing a simple map and then reduce. TIA
Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)
Okay, that fits with what I was expecting. What does your reduce function look like? On Wed, Jan 15, 2014 at 2:33 PM, Soren Macbeth so...@yieldbot.com wrote: 0.8.1-incubating running locally. On January 15, 2014 at 2:28:00 PM, Mark Hamstra (m...@clearstorydata.com//m...@clearstorydata.com) wrote: Spark version? On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.com wrote: Howdy, I'm having some trouble understanding what this exception means, i.e., what the problem it's complaining about is. The full stack trace is here: https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4 I've doing a simple map and then reduce. TIA
Re: Exception in thread DAGScheduler scala.MatchError: None (of class scala.None$)
I'm working on a Clojure DSL, so my map and reduce function are in Clojure, but I updated to the gist to include the code. https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4 (map-reduce-1) works as expected, however, (map-reduce) throws that exception. I've traced the types and outputs along the way and every is identical form what I can tell. (defsparkfn) uses (sparkop) under the hood as well, so that code is essentially identical, which has my scratching my head. On Wed, Jan 15, 2014 at 2:56 PM, Mark Hamstra m...@clearstorydata.comwrote: Okay, that fits with what I was expecting. What does your reduce function look like? On Wed, Jan 15, 2014 at 2:33 PM, Soren Macbeth so...@yieldbot.com wrote: 0.8.1-incubating running locally. On January 15, 2014 at 2:28:00 PM, Mark Hamstra (m...@clearstorydata.com//m...@clearstorydata.com) wrote: Spark version? On Wed, Jan 15, 2014 at 2:19 PM, Soren Macbeth so...@yieldbot.comwrote: Howdy, I'm having some trouble understanding what this exception means, i.e., what the problem it's complaining about is. The full stack trace is here: https://gist.github.com/sorenmacbeth/6f49aa1852d9097deee4 I've doing a simple map and then reduce. TIA
Re: Anyone know hot to submit spark job to yarn in java code?
My problem seems to be related to this: https://issues.apache.org/jira/browse/MAPREDUCE-4052 So, I will try running my setup from a Linux client and see if I have better luck. On 1/15/2014 11:38 AM, Philip Ogren wrote: Great question! I was writing up a similar question this morning and decided to investigate some more before sending. Here's what I'm trying. I have created a new scala project that contains only spark-examples-assembly-0.8.1-incubating.jar and spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the classpath and I am trying to create a yarn-client SparkContext with the following: val spark = new SparkContext(yarn-client, my-app) My hope is to run this on my laptop and have it execute/connect on the yarn application master. The hope is that if I can get this to work, then I can do the same from a web application. I'm trying to unpack run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure out what environment variables I need to set up etc. I grabbed all the .xml files out of my clusters conf directory (in my case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and put them on my classpath. I also set up environment variables SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME. When I run my simple scala script, I get the following error: Exception in thread main org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72) at org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119) at org.apache.spark.SparkContext.init(SparkContext.scala:273) at SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14) at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala) I can look at my yarn UI and see that it registers a failed application, so I take this as incremental progress. However, I'm not sure how to troubleshoot what I'm doing from here or if what I'm trying to do is even sensible/possible. Any advice is appreciated. Thanks, Philip On 1/15/2014 11:25 AM, John Zhao wrote: Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script: export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.
Reading files on a cluster / shared file system
On a cluster where the nodes and the master all have access to a shared filesystem/files - does spark read a file (like one resulting from sc.textFile()) in parallel/different sections on each node? Or is the file read on master in sequence and chunks processed on the nodes afterwards? Thanks! Ognen
jarOfClass method no found in SparkContext
Hello All , I have installed spark on my machine and was succesful in running sbt/sbt package as well as sbt/sbt assembly . I am trying to run the examples in java from eclipse . To be precise i am trying to run the JavaLogQuery example from eclipse . The issue is i am unable to resolve this compilation problem of *jarOfClass being not available inside the Java Spark Context* . I have added all the possible jars and is using Spark 0.8.1 incubating which is the latest one with scala 2.9.3 .I have all jars to the classpath to the point that i do not get any import error . However JavaSparkContext.jarOfClass gives the above error saying the jarOfClass method is unavailable in the JavaSparkContext . I am using Spark-0.8.1 incubating and scala 2.9.3 . Has anyone tried to run the java sample examples from eclipse . Please note that this is a compile time error in eclipse . Regards Arjun
Re: jarOfClass method no found in SparkContext
Could it be possible that you have an older version of JavaSparkContext (i.e. from an older version of Spark) in your path? Please check that there aren't two versions of Spark accidentally included in your class path used in Eclipse. It would not give errors in the import (as it finds the imported packages and classes) but would give such errors as it may be unfortunately finding an older version of JavaSparkContext class in the class path. TD On Wed, Jan 15, 2014 at 4:14 PM, arjun biswas arjunbiswas@gmail.comwrote: Hello All , I have installed spark on my machine and was succesful in running sbt/sbt package as well as sbt/sbt assembly . I am trying to run the examples in java from eclipse . To be precise i am trying to run the JavaLogQuery example from eclipse . The issue is i am unable to resolve this compilation problem of *jarOfClass being not available inside the Java Spark Context* . I have added all the possible jars and is using Spark 0.8.1 incubating which is the latest one with scala 2.9.3 .I have all jars to the classpath to the point that i do not get any import error . However JavaSparkContext.jarOfClass gives the above error saying the jarOfClass method is unavailable in the JavaSparkContext . I am using Spark-0.8.1 incubating and scala 2.9.3 . Has anyone tried to run the java sample examples from eclipse . Please note that this is a compile time error in eclipse . Regards Arjun
RE: Anyone know hot to submit spark job to yarn in java code?
Hi Regarding your question 1) when I run the above script, which jar is beed submitted to the yarn server ? What SPARK_JAR env point to and the --jar point to are both submitted to the yarn server 2) It like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? The spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar will also go to yarn cluster as runtime for app jar(spark-examples-assembly-0.8.1-incubating.jar) 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. You can use the yarn-client mode, you might want to take a look on docs/running-on-yarn.md, and probably you might want to try master branch to check our latest update on this part of docs. And in yarn client mode, the sparkcontext itself will do similar thing as what the command line is doing to submit a yarn job Then to use it with java, you might want to try out JavaSparkContext instead of SparkContext, I don't personally run it with complicated applications. But a small example app did works. Best Regards, Raymond Liu -Original Message- From: John Zhao [mailto:jz...@alpinenow.com] Sent: Thursday, January 16, 2014 2:25 AM To: user@spark.incubator.apache.org Subject: Anyone know hot to submit spark job to yarn in java code? Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script: export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.
Re: Reading files on a cluster / shared file system
If you are running a distributed Spark cluster over the nodes, then the reading should be done in a distributed manner. If you give sc.textFile() a local path to a directory in the shared file system, then each worker should read a subset of the files in directory by accessing them locally. Nothing should be read on the master. TD On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski og...@nengoiksvelzud.comwrote: On a cluster where the nodes and the master all have access to a shared filesystem/files - does spark read a file (like one resulting from sc.textFile()) in parallel/different sections on each node? Or is the file read on master in sequence and chunks processed on the nodes afterwards? Thanks! Ognen
Re: Master and worker nodes in standalone deployment
you can start a worker process in the master node so that all nodes in your cluster can participate in the computation Best, -- Nan Zhu On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote: When spark is deployed on cluster in standalone deployment mode (V 0.81), one of the node is started as master and others as workers. What does the master node does ? Can it participates in actual computations or does it just acts as coordinator ? Thanks, Manoj
Re: Master and worker nodes in standalone deployment
Thanks, Could you still explain what does master process does ? On Wed, Jan 15, 2014 at 8:36 PM, Nan Zhu zhunanmcg...@gmail.com wrote: you can start a worker process in the master node so that all nodes in your cluster can participate in the computation Best, -- Nan Zhu On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote: When spark is deployed on cluster in standalone deployment mode (V 0.81), one of the node is started as master and others as workers. What does the master node does ? Can it participates in actual computations or does it just acts as coordinator ? Thanks, Manoj
Re: Master and worker nodes in standalone deployment
it maintains the running of worker process, create executor for the tasks in the worker nodes, contacts with driver program, etc. -- Nan Zhu On Wednesday, January 15, 2014 at 11:37 PM, Manoj Samel wrote: Thanks, Could you still explain what does master process does ? On Wed, Jan 15, 2014 at 8:36 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: you can start a worker process in the master node so that all nodes in your cluster can participate in the computation Best, -- Nan Zhu On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote: When spark is deployed on cluster in standalone deployment mode (V 0.81), one of the node is started as master and others as workers. What does the master node does ? Can it participates in actual computations or does it just acts as coordinator ? Thanks, Manoj
Re: jarOfClass method no found in SparkContext
Thanks for pointing me to that mistake . Yes i was using the spark 0.8.1 incubating jar and the master branch code examples . I corrected the mistake Regards On Wed, Jan 15, 2014 at 5:51 PM, Patrick Wendell pwend...@gmail.com wrote: Hm, are you sure you haven't included the master branch of Spark somehow in your classpath? jarOfClass was added to Java in the master branch and Spark 0.9.0 (RC). So it seems a lot like you have a newer (post 0.8.X) version of the examples. - Patrick On Wed, Jan 15, 2014 at 5:04 PM, arjun biswas arjunbiswas@gmail.com wrote: Could it be possible that you have an older version of JavaSparkContext (i.e. from an older version of Spark) in your path? Please check that there aren't two versions of Spark accidentally included in your class path used in Eclipse. It would not give errors in the import (as it finds the imported packages and classes) but would give such errors as it may be unfortunately finding an older version of JavaSparkContext class in the class path. I have the following three jars in the class path of eclipse .and no other jar is currently in the classpath 1)google-collections-0.8.jar 2)scala-library.jar 3)spark-core_2.9.3-0.8.1-incubating.jar Am i using the correct jar files to run the java samples from eclipse ? Regards On Wed, Jan 15, 2014 at 4:36 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Could it be possible that you have an older version of JavaSparkContext (i.e. from an older version of Spark) in your path? Please check that there aren't two versions of Spark accidentally included in your class path used in Eclipse. It would not give errors in the import (as it finds the imported packages and classes) but would give such errors as it may be unfortunately finding an older version of JavaSparkContext class in the class path. TD On Wed, Jan 15, 2014 at 4:14 PM, arjun biswas arjunbiswas@gmail.com wrote: Hello All , I have installed spark on my machine and was succesful in running sbt/sbt package as well as sbt/sbt assembly . I am trying to run the examples in java from eclipse . To be precise i am trying to run the JavaLogQuery example from eclipse . The issue is i am unable to resolve this compilation problem of jarOfClass being not available inside the Java Spark Context . I have added all the possible jars and is using Spark 0.8.1 incubating which is the latest one with scala 2.9.3 .I have all jars to the classpath to the point that i do not get any import error . However JavaSparkContext.jarOfClass gives the above error saying the jarOfClass method is unavailable in the JavaSparkContext . I am using Spark-0.8.1 incubating and scala 2.9.3 . Has anyone tried to run the java sample examples from eclipse . Please note that this is a compile time error in eclipse . Regards Arjun