Looks like your driver is not able to connect to the remote executor on machine2/130.49.226.148:60949. Cn you check if the master machine can route to 130.49.226.148
Sujeet On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn <squ...@gatech.edu> wrote: > For some reason, commenting out spark.driver.host and spark.driver.port > fixed something...and broke something else (or at least revealed another > problem). For reference, the only lines I have in my spark-defaults.conf > now: > > spark.app.name myProg > spark.master spark://192.168.1.101:5060 > spark.executor.memory 8g > spark.files.overwrite true > > It starts up, but has problems with machine2. For some reason, machine2 is > having trouble communicating with *itself*. Here are the worker logs of one > of the failures (there are 10 before it quits): > > > Spark assembly has been built with Hive, including Datanucleus jars on > classpath > 14/06/27 14:55:13 INFO ExecutorRunner: Launch command: "java" "-cp" > "::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/ > spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2. > 2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus- > rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/ > datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin- > hadoop2/lib/datanucleus-api-jdo-3.2.1.jar" "-XX:MaxPermSize=128m" > "-Xms8192M" "-Xmx8192M" > "org.apache.spark.executor.CoarseGrainedExecutorBackend" > "akka.tcp://spark@machine1:46378/user/CoarseGrainedScheduler" "7" > "machine2" "8" "akka.tcp://sparkWorker@machine2:48019/user/Worker" > "app-20140627144512-0001" > 14/06/27 14:56:54 INFO Worker: Executor app-20140627144512-0001/7 finished > with state FAILED message Command exited with code 1 exitStatus 1 > 14/06/27 14:56:54 INFO LocalActorRef: Message [akka.remote.transport. > ActorTransportAdapter$DisassociateUnderlying] from > Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/ > system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F% > 2FsparkWorker%40130.49.226.148%3A53561-38#-1924573003] was not delivered. > [10] dead letters encountered. This logging can be turned off or adjusted > with configuration settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > 14/06/27 14:56:54 ERROR EndpointWriter: AssociationError > [akka.tcp://sparkWorker@machine2:48019] -> > [akka.tcp://sparkExecutor@machine2:60949]: > Error [Association failed with [akka.tcp://sparkExecutor@machine2:60949]] > [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://sparkExecutor@machine2:60949] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: machine2/130.49.226.148:60949 > ] > 14/06/27 14:56:54 INFO Worker: Asked to launch executor > app-20140627144512-0001/8 for Funtown, USA > 14/06/27 14:56:54 ERROR EndpointWriter: AssociationError > [akka.tcp://sparkWorker@machine2:48019] -> > [akka.tcp://sparkExecutor@machine2:60949]: > Error [Association failed with [akka.tcp://sparkExecutor@machine2:60949]] > [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://sparkExecutor@machine2:60949] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: machine2/130.49.226.148:60949 > ] > 14/06/27 14:56:54 ERROR EndpointWriter: AssociationError > [akka.tcp://sparkWorker@machine2:48019] -> > [akka.tcp://sparkExecutor@machine2:60949]: > Error [Association failed with [akka.tcp://sparkExecutor@machine2:60949]] > [ > akka.remote.EndpointAssociationException: Association failed with > [akka.tcp://sparkExecutor@machine2:60949] > Caused by: > akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: > Connection refused: machine2/130.49.226.148:60949 > ] > > Port 48019 on machine2 is indeed open, connected, and listening. Any ideas? > > Thanks! > > Shannon > > On 6/27/14, 1:54 AM, sujeetv wrote: > >> Try to explicitly set set the "spark.driver.host" property to the master's >> IP. >> Sujeet >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Spark-standalone-network-configuration-problems- >> tp8304p8396.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >