This was actually a bug in the log message itself, where the Master would
print its own ip and port instead of the registered worker's. It has been
fixed in 0.9.1 and 1.0.0 (here's the patch:
https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
).

Sorry about the confusion!


On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <m...@coactus.com> wrote:

> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote:
> > Hi,
> >
> > I'm wondering why the master is registering itself at startup, exactly 3
> > times (same number as the number of workers). Log excerpt:
> > ""
> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
> > started
> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
> Starting
> > Spark master at spark://hadoop-pg-5.cluster:7077
> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
> > jetty-7.x.y-SNAPSHOT
> > 2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector:
> > Started SelectChannelConnector@0.0.0.0:18080
> > 2014-04-11 21:08:16,343 INFO
> org.apache.spark.deploy.master.ui.MasterWebUI:
> > Started Master web UI at http://hadoop-pg-5.cluster:18080
> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
> have
> > been elected leader! New state: ALIVE
> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > ""
> >
> > The workers should be hadoop-pg-7/-8/-9
> > This seems strange to me, or do I just interpret the log entry wrong ?!?!
> > Perhaps this relates to my other post titled "TaskSchedulerImpl: Initial
> job
> > has not accepted any resources" and both issues are caused by the same
> > problem/misconfiguration.
> >
> > All the nodes can reach each other via ping and telnet on the
> corresponding
> > ports.
> >
> > Any hints what is going wrong there?
>
> I had this happen in a virtualbox configuration I was testing with,
> and never got it fixed, but suspected it was because I wasn't using
> FQDNs (as you're also not doing). I had found - can't find it again
> now, of course - a message in the archives from somebody also
> suffering from this issue while still using FQDNs, but claimed that
> they found an error in /etc/hosts.
>
> Good luck.
>

Reply via email to