See this answer by Josh
http://stackoverflow.com/questions/26692658/cant-connect-from-application-to-the-standalone-cluster

You may also find this post useful
http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3c7a889b1c-aa14-4cf2-8375-37f9cf827...@gmail.com%3E

Thanks
Best Regards

On Wed, Feb 11, 2015 at 10:11 AM, lakewood <pxy0...@gmail.com> wrote:

> Hi,
>
> I'm new to Spark. I have built small spark on yarn cluster, which contains
> 1 master(20GB RAM, 8 core), 3 worker(4GB RAM, 4 core). When trying to run a
> command sc.parallelize(1 to 1000).count() through
> $SPARK_HOME/bin/spark-shell, sometimes the command can submit a job
> successfully, sometimes it is failure with following exception.
>
> I can definitely make sure the three workers are registered to master
> after checking out spark webui. There are spark memory-related parameters
> to be configured in spark-env.sh file, for instance,
> SPARK_EXECUTOR_MEMORY=2G, SPARK_DRIVER_MEMORY=1G, SPARK_WORKER_MEMORY=4G.
>
> Would anyone help to give me hint how to resolve this issue? I have not
> give any hint after google search.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *# bin/spark-shellSpark assembly has been built with Hive, including
> Datanucleus jars on classpath15/02/11 12:21:39 INFO SecurityManager:
> Changing view acls to: root,15/02/11 12:21:39 INFO SecurityManager:
> Changing modify acls to: root,15/02/11 12:21:39 INFO SecurityManager:
> SecurityManager: authentication disabled; ui acls disabled; users with view
> permissions: Set(root, ); users with modify permissions: Set(root,
> )15/02/11 12:21:39 INFO HttpServer: Starting HTTP Server15/02/11 12:21:39
> INFO Utils: Successfully started service 'HTTP class server' on port
> 28968.Welcome to      ____              __     / __/__  ___ _____/ /__
> _\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
> /_/Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.6.0_24)Type
> in expressions to have them evaluated.Type :help for more
> information.15/02/11 12:21:43 INFO SecurityManager: Changing view acls to:
> root,15/02/11 12:21:43 INFO SecurityManager: Changing modify acls to:
> root,15/02/11 12:21:43 INFO SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(root, ); users with modify permissions: Set(root, )15/02/11 12:21:44
> INFO Slf4jLogger: Slf4jLogger started15/02/11 12:21:44 INFO Remoting:
> Starting remoting15/02/11 12:21:44 INFO Remoting: Remoting started;
> listening on addresses :[akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11
> 12:21:44 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sparkDriver@xpan-biqa1:6862]15/02/11 12:21:44 INFO Utils:
> Successfully started service 'sparkDriver' on port 6862.15/02/11 12:21:44
> INFO SparkEnv: Registering MapOutputTracker15/02/11 12:21:44 INFO SparkEnv:
> Registering BlockManagerMaster15/02/11 12:21:44 INFO DiskBlockManager:
> Created local directory at /tmp/spark-local-20150211122144-ed2615/02/11
> 12:21:44 INFO Utils: Successfully started service 'Connection manager for
> block manager' on port 40502.15/02/11 12:21:44 INFO ConnectionManager:
> Bound socket to port 40502 with id =
> ConnectionManagerId(xpan-biqa1,40502)15/02/11 12:21:44 INFO MemoryStore:
> MemoryStore started with capacity 265.0 MB15/02/11 12:21:44 INFO
> BlockManagerMaster: Trying to register BlockManager15/02/11 12:21:44 INFO
> BlockManagerMasterActor: Registering block manager xpan-biqa1:40502 with
> 265.0 MB RAM15/02/11 12:21:44 INFO BlockManagerMaster: Registered
> BlockManager15/02/11 12:21:44 INFO HttpFileServer: HTTP File server
> directory is /tmp/spark-0a80ce6b-6a05-4163-a97d-07753f627ec815/02/11
> 12:21:44 INFO HttpServer: Starting HTTP Server15/02/11 12:21:44 INFO Utils:
> Successfully started service 'HTTP file server' on port 25939.15/02/11
> 12:21:44 INFO Utils: Successfully started service 'SparkUI' on port
> 4040.15/02/11 12:21:44 INFO SparkUI: Started SparkUI at
> http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:21:45 WARN
> NativeCodeLoader: Unable to load native-hadoop library for your platform...
> using builtin-java classes where applicable15/02/11 12:21:46 INFO
> EventLoggingListener: Logging events to
> hdfs://xpan-biqa1:7020/spark/spark-shell-142362850543115/02/11 12:21:46
> INFO AppClient$ClientActor: Connecting to master
> spark://xpan-biqa1:7077...15/02/11 12:21:46 INFO
> SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.015/02/11 12:21:46
> INFO SparkILoop: Created spark context..Spark context available as
> sc.scala> 15/02/11 12:22:06 INFO AppClient$ClientActor: Connecting to
> master spark://xpan-biqa1:7077...scala> sc.parallelize(1 to
> 1000).count()15/02/11 12:22:24 INFO SparkContext: Starting job: count at
> <console>:1315/02/11 12:22:24 INFO DAGScheduler: Got job 0 (count at
> <console>:13) with 2 output partitions (allowLocal=false)15/02/11 12:22:24
> INFO DAGScheduler: Final stage: Stage 0(count at <console>:13)15/02/11
> 12:22:24 INFO DAGScheduler: Parents of final stage: List()15/02/11 12:22:24
> INFO DAGScheduler: Missing parents: List()15/02/11 12:22:24 INFO
> DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize
> at <console>:13), which has no missing parents15/02/11 12:22:24 INFO
> MemoryStore: ensureFreeSpace(1088) called with curMem=0,
> maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block broadcast_0
> stored as values in memory (estimated size 1088.0 B, free 265.0 MB)15/02/11
> 12:22:24 INFO MemoryStore: ensureFreeSpace(800) called with curMem=1088,
> maxMem=27784249315/02/11 12:22:24 INFO MemoryStore: Block
> broadcast_0_piece0 stored as bytes in memory (estimated size 800.0 B, free
> 265.0 MB)15/02/11 12:22:24 INFO BlockManagerInfo: Added broadcast_0_piece0
> in memory on xpan-biqa1:40502 (size: 800.0 B, free: 265.0 MB)15/02/11
> 12:22:24 INFO BlockManagerMaster: Updated info of block
> broadcast_0_piece015/02/11 12:22:24 INFO DAGScheduler: Submitting 2 missing
> tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at
> <console>:13)15/02/11 12:22:24 INFO TaskSchedulerImpl: Adding task set 0.0
> with 2 tasks15/02/11 12:22:26 INFO AppClient$ClientActor: Connecting to
> master spark://xpan-biqa1:7077...15/02/11 12:22:39 WARN TaskSchedulerImpl:
> Initial job has not accepted any resources; check your cluster UI to ensure
> that workers are registered and have sufficient memory15/02/11 12:22:46
> ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All
> masters are unresponsive! Giving up.15/02/11 12:22:46 INFO
> TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed,
> from pool15/02/11 12:22:46 INFO TaskSchedulerImpl: Cancelling stage
> 015/02/11 12:22:46 INFO DAGScheduler: Failed to run count at
> <console>:1315/02/11 12:22:46 INFO SparkUI: Stopped Spark web UI at
> http://xpan-biqa1:4040 <http://xpan-biqa1:4040>15/02/11 12:22:46 INFO
> DAGScheduler: Stopping DAGScheduler15/02/11 12:22:46 INFO
> SparkDeploySchedulerBackend: Shutting down all executors15/02/11 12:22:46
> INFO SparkDeploySchedulerBackend: Asking each executor to shut
> downorg.apache.spark.SparkException: Job aborted due to stage failure: All
> masters are unresponsive! Giving up.        at
> org.apache.spark.scheduler.DAGScheduler.org
> <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> at scala.Option.foreach(Option.scala:236)        at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)        at
> akka.actor.ActorCell.invoke(ActorCell.scala:456)        at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)        at
> akka.dispatch.Mailbox.run(Mailbox.scala:219)        at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)scala>
> 15/02/11 12:22:47 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor
> stopped!*
>
>
> Regards,
> Ryan
>

Reply via email to