Spark fails to run practically any standalone mode jobs sent to it. The local
mode works and spark-shell works even in standalone, but sending any other
jobs manually fails with worker posting the following error:
2014-01-14 15:47:05,073 [sparkWorker-akka.actor.default-dispatcher-5] INFO
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:05,715 [sparkWorker-akka.actor.default-dispatcher-2] INFO
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077
2014-01-14 15:47:23,408 [sparkWorker-akka.actor.default-dispatcher-14] INFO
org.apache.spark.deploy.worker.Worker - Asked to launch executor
app-20140114154723-0000/0 for Spark test
2014-01-14 15:47:23,431 [sparkWorker-akka.actor.default-dispatcher-14] ERROR
akka.actor.OneForOneStrategy -
java.lang.NullPointerException
at java.io.File.<init>(File.java:251)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:213)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO
org.apache.spark.deploy.worker.Worker - Starting Spark worker
niko-VirtualBox.local:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO
org.apache.spark.deploy.worker.Worker - Spark home:
/home/niko/local/incubator-spark
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO
org.apache.spark.deploy.worker.ui.WorkerWebUI - Started Worker web UI at
http://niko-VirtualBox.local:8081
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:23,528 [sparkWorker-akka.actor.default-dispatcher-3] INFO
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077
Master spits out the following logs at the same time:
2014-01-14 15:47:05,683 [sparkMaster-akka.actor.default-dispatcher-4] INFO
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,090 [sparkMaster-akka.actor.default-dispatcher-15] INFO
org.apache.spark.deploy.master.Master - Registering app Spark test
2014-01-14 15:47:23,102 [sparkMaster-akka.actor.default-dispatcher-15] INFO
org.apache.spark.deploy.master.Master - Registered app Spark test with ID
app-20140114154723-0000
2014-01-14 15:47:23,216 [sparkMaster-akka.actor.default-dispatcher-15] INFO
org.apache.spark.deploy.master.Master - Launching executor
app-20140114154723-0000/0 on worker
worker-20140114154704-niko-VirtualBox.local-33576
2014-01-14 15:47:23,523 [sparkMaster-akka.actor.default-dispatcher-15] INFO
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,525 [sparkMaster-akka.actor.default-dispatcher-15] INFO
org.apache.spark.deploy.master.Master - Attempted to re-register worker at
same address: akka.tcp://[email protected]:33576
2014-01-14 15:47:23,535 [sparkMaster-akka.actor.default-dispatcher-14] WARN
org.apache.spark.deploy.master.Master - Got heartbeat from unregistered
worker worker-20140114154723-niko-VirtualBox.local-33576
...
Soon after this the master decides that the worker is dead, disassociates it
and marks it DEAD in the web UI. The worker process however is still alive
and still thinks that it's connected to master (as shown by the log).
I'm launching the job with the following command (last argument is the
master, replacing local there makes things run ok):
java -cp
./target/classes:/etc/hadoop/conf:$SPARK_HOME/conf:$SPARK_HOME/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.0-mr1-cdh4.5.0.jar
SparkTest spark://niko-VirtualBox:7077
Relevant versions are:
Spark: current git HEAD fa75e5e1c50da7d1e6c6f41c2d6d591c1e8a025f
Hadoop: 2.0.0-mr1-cdh4.5.0
Scala: 2.10.3
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Akka-error-kills-workers-in-standalone-mode-tp537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.