I switched which machine was the master and which was the dedicated
worker, and now it works just fine. I discovered machine2 is on my
department's DMZ; machine1 is not. I suspect the departmental firewall
was causing problems. By moving the master to machine2, that seems to
have solved my problems.
Thank you all very much for your help. I'm sure I'll have other
questions soon :)
Regards,
Shannon
On 6/27/14, 3:22 PM, Sujeet Varakhedi wrote:
Looks like your driver is not able to connect to the remote executor
on machine2/130.49.226.148:60949 <http://130.49.226.148:60949/>. Cn
you check if the master machine can route to 130.49.226.148
Sujeet
On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn <squ...@gatech.edu
<mailto:squ...@gatech.edu>> wrote:
For some reason, commenting out spark.driver.host and
spark.driver.port fixed something...and broke something else (or
at least revealed another problem). For reference, the only lines
I have in my spark-defaults.conf now:
spark.app.name <http://spark.app.name> myProg
spark.master spark://192.168.1.101:5060
<http://192.168.1.101:5060>
spark.executor.memory 8g
spark.files.overwrite true
It starts up, but has problems with machine2. For some reason,
machine2 is having trouble communicating with *itself*. Here are
the worker logs of one of the failures (there are 10 before it
quits):
Spark assembly has been built with Hive, including Datanucleus
jars on classpath
14/06/27 14:55:13 INFO ExecutorRunner: Launch command: "java"
"-cp"
"::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar"
"-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://spark@machine1:46378/user/CoarseGrainedScheduler" "7"
"machine2" "8" "akka.tcp://sparkWorker@machine2:48019/user/Worker"
"app-20140627144512-0001"
14/06/27 14:56:54 INFO Worker: Executor app-20140627144512-0001/7
finished with state FAILED message Command exited with code 1
exitStatus 1
14/06/27 14:56:54 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40130.49.226.148%3A53561-38#-1924573003]
was not delivered. [10] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@machine2:48019] ->
[akka.tcp://sparkExecutor@machine2:60949]: Error [Association
failed with [akka.tcp://sparkExecutor@machine2:60949]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@machine2:60949]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: machine2/130.49.226.148:60949
<http://130.49.226.148:60949>
]
14/06/27 14:56:54 INFO Worker: Asked to launch executor
app-20140627144512-0001/8 for Funtown, USA
14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@machine2:48019] ->
[akka.tcp://sparkExecutor@machine2:60949]: Error [Association
failed with [akka.tcp://sparkExecutor@machine2:60949]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@machine2:60949]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: machine2/130.49.226.148:60949
<http://130.49.226.148:60949>
]
14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@machine2:48019] ->
[akka.tcp://sparkExecutor@machine2:60949]: Error [Association
failed with [akka.tcp://sparkExecutor@machine2:60949]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@machine2:60949]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: machine2/130.49.226.148:60949
<http://130.49.226.148:60949>
]
Port 48019 on machine2 is indeed open, connected, and listening.
Any ideas?
Thanks!
Shannon
On 6/27/14, 1:54 AM, sujeetv wrote:
Try to explicitly set set the "spark.driver.host" property to
the master's
IP.
Sujeet
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-network-configuration-problems-tp8304p8396.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.