By the way, you can run the sc.getConf.get("spark.driver.host") thing inside spark-shell, whether or not the Executors actually start up successfully.
On Tue, Jul 8, 2014 at 8:23 PM, Aaron Davidson <ilike...@gmail.com> wrote: > You actually should avoid setting SPARK_PUBLIC_DNS unless necessary, I > thought you might have preemptively done so. I think the issue is actually > related to your network configuration, as Spark probably failed to find > your driver's ip address. Do you see a warning on the driver that looks > something like "Your hostname, localhost resolves to a loopback address, > but we couldn't find any external IP address"? > > Either way, let's try to set SPARK_LOCAL_IP (see > http://spark.apache.org/docs/latest/configuration.html) inside > ~/spark/conf/spark-env.sh on your driver machine to an IP address that's > reachable by your executors. Something like > > export SPARK_LOCAL_IP=194.168.1.105 > > You can make sure it was set correctly by running > sc.getConf.get("spark.driver.host"), which should return the driver > hostname, and NOT "localhost". > > (Note that it's also possible that your /etc/hosts file contains a mapping > from the driver's ip address to localhost, which it should not.) > > > On Tue, Jul 8, 2014 at 2:23 PM, Sameer Tilak <ssti...@live.com> wrote: > >> Hi Aaron, >> Would really appreciate your help if you can point me to the >> documentation. Is this something that I need to do with /etc/hosts on each >> of the worker machines ? Or do I set SPARK_PUBLIC_DNS (if yes, what is the >> format?) or something else? >> >> I have the following set up: >> >> master node: pzxnvm2018.x.y.org >> worker nodes: pzxnvm2022.x.y.org pzxnvm2023.x.y.org pzxnvm2024.x.y.org >> >> >> From: ilike...@gmail.com >> Date: Tue, 8 Jul 2014 11:59:54 -0700 >> >> Subject: Re: CoarseGrainedExecutorBackend: Driver Disassociated >> To: user@spark.apache.org >> >> >> Hmm, looks like the Executor is trying to connect to the driver on >> localhost, from this line: >> 14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to >> driver: akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler >> >> What is your setup? Standalone mode with 4 separate machines? Are you >> configuring the driver public dns name somewhere? >> >> >> On Tue, Jul 8, 2014 at 11:52 AM, Sameer Tilak <ssti...@live.com> wrote: >> >> Dear All, >> >> When I look inside the following directory on my worker node: >> $SPARK_HOME/work/app-20140708110707-0001/3 >> >> I see the following error message: >> >> log4j:WARN No appenders could be found for logger >> (org.apache.hadoop.conf.Configuration). >> log4j:WARN Please initialize the log4j system properly. >> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for >> more info. >> 14/07/08 11:07:11 INFO SparkHadoopUtil: Using Spark's default log4j >> profile: org/apache/spark/log4j-defaults.properties >> 14/07/08 11:07:11 INFO SecurityManager: Changing view acls to: p529444 >> 14/07/08 11:07:11 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(p529444) >> 14/07/08 11:07:12 INFO Slf4jLogger: Slf4jLogger started >> 14/07/08 11:07:12 INFO Remoting: Starting remoting >> 14/07/08 11:07:13 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679 >> <http://pzxnvm2022.dcld.pldc.kp.org:34679>] >> 14/07/08 11:07:13 INFO Remoting: Remoting now listens on addresses: >> [akka.tcp://sparkexecu...@pzxnvm2022.x.y.name.org:34679 >> <http://pzxnvm2022.x.y.name.org:34679>] >> 14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to >> driver: akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler >> 14/07/08 11:07:13 INFO WorkerWatcher: Connecting to worker akka.tcp:// >> sparkwor...@pzxnvm2022.x.y.name.org:37054/user/Worker >> <http://pzxnvm2022.x.y.name.org:37054/user/Worker> >> 14/07/08 11:07:13 ERROR CoarseGrainedExecutorBackend: Driver >> Disassociated [akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679 >> <http://pzxnvm2022.dcld.pldc.kp.org:34679>] -> [akka >> >> >> I am not sure what the problem is but it is preventing me to get the 4 >> node test cluster up and running. >> >> >> >