On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote: > Hi, > > I'm wondering why the master is registering itself at startup, exactly 3 > times (same number as the number of workers). Log excerpt: > "" > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger > started > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master: Starting > Spark master at spark://hadoop-pg-5.cluster:7077 > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server: > jetty-7.x.y-SNAPSHOT > 2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector: > Started SelectChannelConnector@0.0.0.0:18080 > 2014-04-11 21:08:16,343 INFO org.apache.spark.deploy.master.ui.MasterWebUI: > Started Master web UI at http://hadoop-pg-5.cluster:18080 > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I have > been elected leader! New state: ALIVE > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master: > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master: > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master: > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM > "" > > The workers should be hadoop-pg-7/-8/-9 > This seems strange to me, or do I just interpret the log entry wrong ?!?! > Perhaps this relates to my other post titled "TaskSchedulerImpl: Initial job > has not accepted any resources" and both issues are caused by the same > problem/misconfiguration. > > All the nodes can reach each other via ping and telnet on the corresponding > ports. > > Any hints what is going wrong there?
I had this happen in a virtualbox configuration I was testing with, and never got it fixed, but suspected it was because I wasn't using FQDNs (as you're also not doing). I had found - can't find it again now, of course - a message in the archives from somebody also suffering from this issue while still using FQDNs, but claimed that they found an error in /etc/hosts. Good luck.