I am using Spark Standalone mode to deploy a Spark cluster (Spark v.1.2.1) on 3 machines. The cluster is as follows:
Machine A: Spark Master + Spark Worker Machine B: Spark Worker Machine C: Spark Worker I start the spark driver from the /bin/spark-shell of the Machine A. Even though for the machines B and C an executor is successfully registered with the spark driver, the executor which tries to start on the machine A is repeatedly created and removed. Specifically, the output from the spark-shell is the following: *spark-shell* 05:04:45 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@instance-trans2.c.bubbly-operator-90323.internal:63000/user/Executor#813048741] with ID 0 15/06/21 05:04:46 ERROR TaskSchedulerImpl: Lost executor 0 on instance-trans2.c.bubbly-operator-90323.internal: remote Akka client disassociated 15/06/21 05:04:46 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 15/06/21 05:04:46 INFO DAGScheduler: Executor lost: 0 (epoch 0) 15/06/21 05:04:46 INFO BlockManagerMasterActor: Trying to remove executor 0 from BlockManagerMaster. 15/06/21 05:04:46 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@instance-trans2.c.bubbly-operator-90323.internal:63000] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/06/21 05:04:46 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor 15/06/21 05:04:46 INFO AppClient$ClientActor: Executor updated: app-20150621050443-0000/0 is now EXITED (Command exited with code 1) 15/06/21 05:04:46 INFO SparkDeploySchedulerBackend: Executor app-20150621050443-0000/0 removed: Command exited with code 1 15/06/21 05:04:46 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 15/06/21 05:04:46 INFO AppClient$ClientActor: Executor added: app-20150621050443-0000/1 on worker-20150621050415-instance-trans2.c.bubbly-operator-90323.internal-45146 (instance-trans2.c.bubbly-operator-90323.internal:45146) with 2 cores 15/06/21 05:04:46 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150621050443-0000/1 on hostPort instance-trans2.c.bubbly-operator-90323.internal:45146 with 2 cores, 4.0 GB RAM 15/06/21 05:04:46 INFO AppClient$ClientActor: Executor updated: app-20150621050443-0000/1 is now LOADING 15/06/21 05:04:46 INFO AppClient$ClientActor: Executor updated: app-20150621050443-0000/1 is now RUNNING 15/06/21 05:04:48 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@instance-trans2.c.bubbly-operator-90323.internal:63000/user/Executor#-538203403] with ID 1 15/06/21 05:04:50 ERROR TaskSchedulerImpl: Lost executor 1 on instance-trans2.c.bubbly-operator-90323.internal: remote Akka client disassociated 15/06/21 05:04:50 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 15/06/21 05:04:50 INFO DAGScheduler: Executor lost: 1 (epoch 1) 15/06/21 05:04:50 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@instance-trans2.c.bubbly-operator-90323.internal:63000] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. Moreover, to further clarify my cluster I have assigned a different port both for the spark driver and the executor, as well as for the block manager. However, the above message is keep being shown. From a closer inspection inside the spark_worker directory I found the following logs written in the stderr: *stderr* 15/06/21 05:04:47 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/06/21 05:04:47 INFO SecurityManager: Changing view acls to: root 15/06/21 05:04:47 INFO SecurityManager: Changing modify acls to: root 15/06/21 05:04:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/06/21 05:04:47 INFO Slf4jLogger: Slf4jLogger started 15/06/21 05:04:47 INFO Remoting: Starting remoting 15/06/21 05:04:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@instance-trans2.c.bubbly-operator-90323.internal:63000] 15/06/21 05:04:48 INFO Utils: Successfully started service 'driverPropsFetcher' on port 63000. 15/06/21 05:04:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/06/21 05:04:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/06/21 05:04:48 INFO SecurityManager: Changing view acls to: root 15/06/21 05:04:48 INFO SecurityManager: Changing modify acls to: root 15/06/21 05:04:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/06/21 05:04:48 INFO Slf4jLogger: Slf4jLogger started 15/06/21 05:04:48 INFO Remoting: Starting remoting 15/06/21 05:04:48 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 15/06/21 05:04:48 INFO Utils: Successfully started service 'sparkExecutor' on port 63000. 15/06/21 05:04:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@instance-trans2.c.bubbly-operator-90323.internal:63000] 15/06/21 05:04:48 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver@instance-trans2.c.bubbly-operator-90323.internal:3389/user/CoarseGrainedScheduler 15/06/21 05:04:48 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@instance-trans2.c.bubbly-operator-90323.internal:45146/user/Worker 15/06/21 05:04:48 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@instance-trans2.c.bubbly-operator-90323.internal:45146/user/Worker 15/06/21 05:04:48 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 15/06/21 05:04:48 INFO Executor: Starting executor ID 1 on host instance-trans2.c.bubbly-operator-90323.internal 15/06/21 05:04:48 INFO SecurityManager: Changing view acls to: root 15/06/21 05:04:48 INFO SecurityManager: Changing modify acls to: root 15/06/21 05:04:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/06/21 05:04:48 INFO AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver@instance-trans2.c.bubbly-operator-90323.internal:3389/user/MapOutputTracker 15/06/21 05:04:48 INFO AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver@instance-trans2.c.bubbly-operator-90323.internal:3389/user/BlockManagerMaster 15/06/21 05:04:48 INFO DiskBlockManager: Created local directory at /u01/spark_dir/spark_local/spark-833c27e7-4589-4f64-a0df-534d2e4ef055/spark-3e491a9d-d615-48f7-9077-f87ef7ca0c0e/spark-49d1f4bb-c3b6-4458-8453-7037144ac2a1/spark$ 15/06/21 05:04:48 INFO MemoryStore: MemoryStore started with capacity 2.1 GB 15/06/21 05:04:49 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver@instance-trans2.c.bubbly-operator-90323.internal:3389/user/CoarseGrainedScheduler 15/06/21 05:04:49* ERROR OneForOneStrategy: Address already in use java.net.BindException: Address already in use* at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/06/21 05:04:49 ERROR CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID: 1 Therefore, I would like to ask if it possible to start a Spark Driver and a Spark Executor on the same machine and if yes is there a case that they are both somehow bind to the same port after some point? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Driver-and-Executor-on-the-same-machine-tp23417.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org