Hi, Instead of spark://10.1.3.7:7077 use spark://vmsparkwin1:7077 try this
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master > spark://vmsparkwin1:7077 --executor-memory 1G --total-executor-cores 2 > ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10 Thanks & Regards, Meethu M On Friday, 18 July 2014 7:51 AM, Jay Vyas <jayunit100.apa...@gmail.com> wrote: I think I know what is happening to you. I've looked some into this just this week, and so its fresh in my brain :) hope this helps. When no workers are known to the master, iirc, you get this message. I think this is how it works. 1) You start your master 2) You start a slave, and give it master url as an argument. 3) The slave then binds to a random port 4) The slave then does a handshake with master, which you can see in the slave logs (it sais something like "sucesfully connected to master at …". Actualy, i think tha master also logs that it now is aware of a slave running on ip:port… So in your case, I suspect, none of the slaves have connected to the master, so the job sits idle. This is similar to the yarn scenario of submitting a job to a resource manager with no node-managers running. On Jul 17, 2014, at 6:57 PM, ranjanp <piyush_ran...@hotmail.com> wrote: > Hi, > I am new to Spark and trying out with a stand-alone, 3-node (1 master, 2 > workers) cluster. > > From the Web UI at the master, I see that the workers are registered. But > when I try running the SparkPi example from the master node, I get the > following message and then an exception. > > 14/07/17 01:20:36 INFO AppClient$ClientActor: Connecting to master > spark://10.1.3.7:7077... > 14/07/17 01:20:46 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient memory > > I searched a bit for the above warning, and found and found that others have > encountered this problem before, but did not see a clear resolution except > for this link: > http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tt8247.html#a8444 > > Based on the suggestion there I tried supplying --executor-memory option to > spark-submit but that did not help. > > Any suggestions. Here are the details of my set up. > - 3 nodes (each with 4 CPU cores and 7 GB memory) > - 1 node configured as Master, and the other two configured as workers > - Firewall is disabled on all nodes, and network communication between the > nodes is not a problem > - Edited the conf/spark-env.sh on all nodes to set the following: > SPARK_WORKER_CORES=3 > SPARK_WORKER_MEMORY=5G > - The Web UI as well as logs on master show that Workers were able to > register correctly. Also the Web UI correctly shows the aggregate available > memory and CPU cores on the workers: > > URL: spark://vmsparkwin1:7077 > Workers: 2 > Cores: 6 Total, 0 Used > Memory: 10.0 GB Total, 0.0 B Used > Applications: 0 Running, 0 Completed > Drivers: 0 Running, 0 Completed > Status: ALIVE > > I try running the SparkPi example first using the run-example (which was > failing) and later directly using the spark-submit as shown below: > > $ export MASTER=spark://vmsparkwin1:7077 > > $ echo $MASTER > spark://vmsparkwin1:7077 > > azureuser@vmsparkwin1 /cygdrive/c/opt/spark-1.0.0 > $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master > spark://10.1.3.7:7077 --executor-memory 1G --total-executor-cores 2 > ./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10 > > > The following is the full screen output: > > 14/07/17 01:20:13 INFO SecurityManager: Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 14/07/17 01:20:13 INFO SecurityManager: Changing view acls to: azureuser > 14/07/17 01:20:13 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(azureuser) > 14/07/17 01:20:14 INFO Slf4jLogger: Slf4jLogger started > 14/07/17 01:20:14 INFO Remoting: Starting remoting > 14/07/17 01:20:14 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sp...@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839] > 14/07/17 01:20:14 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://sp...@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839] > 14/07/17 01:20:14 INFO SparkEnv: Registering MapOutputTracker > 14/07/17 01:20:14 INFO SparkEnv: Registering BlockManagerMaster > 14/07/17 01:20:14 INFO DiskBlockManager: Created local directory at > C:\cygwin\tmp\spark-local-20140717012014-b606 > 14/07/17 01:20:14 INFO MemoryStore: MemoryStore started with capacity 294.9 > MB. > 14/07/17 01:20:14 INFO ConnectionManager: Bound socket to port 49842 with id > = ConnectionManagerId(vmsparkwin1.cssparkwin.b1.internal.cloudapp.net,49842) > 14/07/17 01:20:14 INFO BlockManagerMaster: Trying to register BlockManager > 14/07/17 01:20:14 INFO BlockManagerInfo: Registering block manager > vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49842 with 294.9 MB RAM > 14/07/17 01:20:14 INFO BlockManagerMaster: Registered BlockManager > 14/07/17 01:20:14 INFO HttpServer: Starting HTTP Server > 14/07/17 01:20:14 INFO HttpBroadcast: Broadcast server started at > http://10.1.3.7:49843 > 14/07/17 01:20:14 INFO HttpFileServer: HTTP File server directory is > C:\cygwin\tmp\spark-6a076e92-53bb-4c7a-9e27-ce53a818146d > 14/07/17 01:20:14 INFO HttpServer: Starting HTTP Server > 14/07/17 01:20:15 INFO SparkUI: Started SparkUI at > http://vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:4040 > 14/07/17 01:20:15 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/07/17 01:20:16 INFO SparkContext: Added JAR > file:/C:/opt/spark-1.0.0/./lib/spark-examples-1.0.0-hadoop2.2.0.jar at > http://10.1.3.7:49844/jars/spark-examples-1.0.0-hadoop2.2.0.jar with > timestamp 1405560016316 > 14/07/17 01:20:16 INFO AppClient$ClientActor: Connecting to master > spark://10.1.3.7:7077... > 14/07/17 01:20:16 INFO SparkContext: Starting job: reduce at > SparkPi.scala:35 > 14/07/17 01:20:16 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) > with 10 output partitions (allowLocal=false) > 14/07/17 01:20:16 INFO DAGScheduler: Final stage: Stage 0(reduce at > SparkPi.scala:35) > 14/07/17 01:20:16 INFO DAGScheduler: Parents of final stage: List() > 14/07/17 01:20:16 INFO DAGScheduler: Missing parents: List() > 14/07/17 01:20:16 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map > at SparkPi.scala:31), which has no missing parents > 14/07/17 01:20:16 INFO DAGScheduler: Submitting 10 missing tasks from Stage > 0 (MappedRDD[1] at map at SparkPi.scala:31) > 14/07/17 01:20:16 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks > 14/07/17 01:20:31 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient memory > 14/07/17 01:20:36 INFO AppClient$ClientActor: Connecting to master > spark://10.1.3.7:7077... > 14/07/17 01:20:46 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient memory > 14/07/17 01:20:56 INFO AppClient$ClientActor: Connecting to master > spark://10.1.3.7:7077... > 14/07/17 01:21:01 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient memory > 14/07/17 01:21:16 ERROR SparkDeploySchedulerBackend: Application has been > killed. Reason: All masters are unresponsive! Giving up. > 14/07/17 01:21:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks > have all completed, from pool > 14/07/17 01:21:16 INFO TaskSchedulerImpl: Cancelling stage 0 > 14/07/17 01:21:16 INFO DAGScheduler: Failed to run reduce at > SparkPi.scala:35 > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: All masters are unresponsive! Giving up. > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Error-with-spark-submit-formatting-corrected-tp10102.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.