Spark standalone network configuration problems

Shannon Quinn Wed, 25 Jun 2014 18:08:32 -0700

Hi all,

I have a 2-machine Spark network I've set up: a master and worker onmachine1, and worker on machine2. When I run 'sbin/start-all.sh',everything starts up as it should. I see both workers listed on the UIpage. The logs of both workers indicate successful registration with theSpark master.

The problems begin when I attempt to submit a job: I get an "addressalready in use" exception that crashes the program. It says "Failed tobind to " and lists the exact port and address of the master.

At this point, the only items I have set in my spark-env.sh areSPARK_MASTER_IP and SPARK_MASTER_PORT (non-standard, set to 5060).

The next step I took, then, was to explicitly set SPARK_LOCAL_IP on themaster to 127.0.0.1. This allows the master to successfully send out thejobs; however, it ends up canceling the stage after running this commandseveral times:

14/06/25 21:00:47 INFO AppClient$ClientActor: Executor added:app-20140625210032-0000/8 on worker-20140625205623-machine2-53597(machine2:53597) with 8 cores14/06/25 21:00:47 INFO SparkDeploySchedulerBackend: Granted executor IDapp-20140625210032-0000/8 on hostPort machine2:53597 with 8 cores, 8.0GB RAM14/06/25 21:00:47 INFO AppClient$ClientActor: Executor updated:app-20140625210032-0000/8 is now RUNNING14/06/25 21:00:49 INFO AppClient$ClientActor: Executor updated:app-20140625210032-0000/8 is now FAILED (Command exited with code 1)

The "/8" started at "/1", eventually becomes "/9", and then "/10", atwhich point the program crashes. The worker on machine2 shows similarmessages in its logs. Here are the last bunch:

14/06/25 21:00:31 INFO Worker: Executor app-20140625210032-0000/9finished with state FAILED message Command exited with code 1 exitStatus 114/06/25 21:00:31 INFO Worker: Asked to launch executorapp-20140625210032-0000/10 for app_nameSpark assembly has been built with Hive, including Datanucleus jars onclasspath14/06/25 21:00:32 INFO ExecutorRunner: Launch command: "java" "-cp""::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar""-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M""org.apache.spark.executor.CoarseGrainedExecutorBackend""*akka.tcp://spark@localhost:5060/user/CoarseGrainedScheduler*" "10""machine2" "8" "akka.tcp://sparkWorker@machine2:53597/user/Worker""app-20140625210032-0000"14/06/25 21:00:33 INFO Worker: Executor app-20140625210032-0000/10finished with state FAILED message Command exited with code 1 exitStatus 1

I highlighted the part that seemed strange to me; that's the master portnumber (I set it to 5060), and yet it's referencing localhost? Is thisthe reason why machine2 apparently can't seem to give a confirmation tothe master once the job is submitted? (The logs from the worker on themaster node indicate that it's running just fine)


I appreciate any assistance you can offer!

Regards,
Shannon Quinn

Spark standalone network configuration problems

Reply via email to