Spark fails to run practically any standalone mode jobs sent to it. The local
mode works and spark-shell works even in standalone, but sending any other
jobs manually fails with worker posting the following error:

2014-01-14 15:47:05,073 [sparkWorker-akka.actor.default-dispatcher-5] INFO 
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:05,715 [sparkWorker-akka.actor.default-dispatcher-2] INFO 
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077
2014-01-14 15:47:23,408 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Asked to launch executor
app-20140114154723-0000/0 for Spark test
2014-01-14 15:47:23,431 [sparkWorker-akka.actor.default-dispatcher-14] ERROR
akka.actor.OneForOneStrategy - 
java.lang.NullPointerException
        at java.io.File.<init>(File.java:251)
        at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:213)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Starting Spark worker
niko-VirtualBox.local:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Spark home:
/home/niko/local/incubator-spark
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.ui.WorkerWebUI - Started Worker web UI at
http://niko-VirtualBox.local:8081
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:23,528 [sparkWorker-akka.actor.default-dispatcher-3] INFO 
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077


Master spits out the following logs at the same time:


2014-01-14 15:47:05,683 [sparkMaster-akka.actor.default-dispatcher-4] INFO 
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,090 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registering app Spark test
2014-01-14 15:47:23,102 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registered app Spark test with ID
app-20140114154723-0000
2014-01-14 15:47:23,216 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Launching executor
app-20140114154723-0000/0 on worker
worker-20140114154704-niko-VirtualBox.local-33576
2014-01-14 15:47:23,523 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,525 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Attempted to re-register worker at
same address: akka.tcp://[email protected]:33576
2014-01-14 15:47:23,535 [sparkMaster-akka.actor.default-dispatcher-14] WARN 
org.apache.spark.deploy.master.Master - Got heartbeat from unregistered
worker worker-20140114154723-niko-VirtualBox.local-33576
...

Soon after this the master decides that the worker is dead, disassociates it
and marks it DEAD in the web UI. The worker process however is still alive
and still thinks that it's connected to master (as shown by the log).

I'm launching the job with the following command (last argument is the
master, replacing local there makes things run ok):
java -cp
./target/classes:/etc/hadoop/conf:$SPARK_HOME/conf:$SPARK_HOME/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.0-mr1-cdh4.5.0.jar
SparkTest spark://niko-VirtualBox:7077

Relevant versions are:
Spark: current git HEAD fa75e5e1c50da7d1e6c6f41c2d6d591c1e8a025f
Hadoop: 2.0.0-mr1-cdh4.5.0
Scala: 2.10.3





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Akka-error-kills-workers-in-standalone-mode-tp537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to