By the way, you can run the sc.getConf.get("spark.driver.host") thing
inside spark-shell, whether or not the Executors actually start up
successfully.


On Tue, Jul 8, 2014 at 8:23 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> You actually should avoid setting SPARK_PUBLIC_DNS unless necessary, I
> thought you might have preemptively done so. I think the issue is actually
> related to your network configuration, as Spark probably failed to find
> your driver's ip address. Do you see a warning on the driver that looks
> something like "Your hostname, localhost resolves to a loopback address,
> but we couldn't find any external IP address"?
>
> Either way, let's try to set SPARK_LOCAL_IP (see
> http://spark.apache.org/docs/latest/configuration.html) inside
> ~/spark/conf/spark-env.sh on your driver machine to an IP address that's
> reachable by your executors. Something like
>
> export SPARK_LOCAL_IP=194.168.1.105
>
> You can make sure it was set correctly by running
> sc.getConf.get("spark.driver.host"), which should return the driver
> hostname, and NOT "localhost".
>
> (Note that it's also possible that your /etc/hosts file contains a mapping
> from the driver's ip address to localhost, which it should not.)
>
>
> On Tue, Jul 8, 2014 at 2:23 PM, Sameer Tilak <ssti...@live.com> wrote:
>
>> Hi Aaron,
>> Would really appreciate your help if you can point me to the
>> documentation. Is this something that I need to do with /etc/hosts on each
>> of the worker machines ? Or do I set  SPARK_PUBLIC_DNS (if yes, what is the
>> format?) or something else?
>>
>> I have the following set up:
>>
>> master node: pzxnvm2018.x.y.org
>> worker nodes:  pzxnvm2022.x.y.org pzxnvm2023.x.y.org pzxnvm2024.x.y.org
>>
>>
>> From: ilike...@gmail.com
>> Date: Tue, 8 Jul 2014 11:59:54 -0700
>>
>> Subject: Re: CoarseGrainedExecutorBackend: Driver Disassociated
>> To: user@spark.apache.org
>>
>>
>> Hmm, looks like the Executor is trying to connect to the driver on
>> localhost, from this line:
>> 14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to
>> driver: akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler
>>
>> What is your setup? Standalone mode with 4 separate machines? Are you
>> configuring the driver public dns name somewhere?
>>
>>
>> On Tue, Jul 8, 2014 at 11:52 AM, Sameer Tilak <ssti...@live.com> wrote:
>>
>> Dear All,
>>
>> When I look inside the following directory on my worker node:
>> $SPARK_HOME/work/app-20140708110707-0001/3
>>
>> I see the following error message:
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.hadoop.conf.Configuration).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>> more info.
>> 14/07/08 11:07:11 INFO SparkHadoopUtil: Using Spark's default log4j
>> profile: org/apache/spark/log4j-defaults.properties
>> 14/07/08 11:07:11 INFO SecurityManager: Changing view acls to: p529444
>> 14/07/08 11:07:11 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(p529444)
>> 14/07/08 11:07:12 INFO Slf4jLogger: Slf4jLogger started
>> 14/07/08 11:07:12 INFO Remoting: Starting remoting
>> 14/07/08 11:07:13 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679
>> <http://pzxnvm2022.dcld.pldc.kp.org:34679>]
>> 14/07/08 11:07:13 INFO Remoting: Remoting now listens on addresses:
>> [akka.tcp://sparkexecu...@pzxnvm2022.x.y.name.org:34679
>> <http://pzxnvm2022.x.y.name.org:34679>]
>> 14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to
>> driver: akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler
>> 14/07/08 11:07:13 INFO WorkerWatcher: Connecting to worker akka.tcp://
>> sparkwor...@pzxnvm2022.x.y.name.org:37054/user/Worker
>> <http://pzxnvm2022.x.y.name.org:37054/user/Worker>
>> 14/07/08 11:07:13 ERROR CoarseGrainedExecutorBackend: Driver
>> Disassociated [akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679
>> <http://pzxnvm2022.dcld.pldc.kp.org:34679>] -> [akka
>>
>>
>> I am not sure what the problem is but it is preventing me to get the 4
>> node test cluster up and running.
>>
>>
>>
>

Reply via email to