Re: Master registers itself at startup?

Gerd Koenig Sun, 13 Apr 2014 12:48:05 -0700

Many thanks for your explanation.

So there's just my issue with that "TaskSchedulerImpl: Initial job has not
accepted any resources" stuff that prevents me from starting with Spark (at
least execute the examples successfully) ;)


br, Gerd


On 13 April 2014 10:17, Aaron Davidson <ilike...@gmail.com> wrote:

> By the way, 64 MB of RAM per machine is really small, I'm surprised Spark
> can even start up on that! Perhaps you meant to set SPARK_DAEMON_MEMORY so
> that the actual worker process itself would be small, but
> SPARK_WORKER_MEMORY (which controls the amount of memory available for
> Spark executors) should be at least 512 MB, and ideally many times that.
>
>
> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <ilike...@gmail.com>wrote:
>
>> This was actually a bug in the log message itself, where the Master would
>> print its own ip and port instead of the registered worker's. It has been
>> fixed in 0.9.1 and 1.0.0 (here's the patch:
>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>> ).
>>
>> Sorry about the confusion!
>>
>>
>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <m...@coactus.com> wrote:
>>
>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I'm wondering why the master is registering itself at startup, exactly
>>> 3
>>> > times (same number as the number of workers). Log excerpt:
>>> > ""
>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
>>> > started
>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>> Starting
>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>> > jetty-7.x.y-SNAPSHOT
>>> > 2014-04-11 21:08:16,341 INFO
>>> org.eclipse.jetty.server.AbstractConnector:
>>> > Started SelectChannelConnector@0.0.0.0:18080
>>> > 2014-04-11 21:08:16,343 INFO
>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
>>> have
>>> > been elected leader! New state: ALIVE
>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > ""
>>> >
>>> > The workers should be hadoop-pg-7/-8/-9
>>> > This seems strange to me, or do I just interpret the log entry wrong
>>> ?!?!
>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>> Initial job
>>> > has not accepted any resources" and both issues are caused by the same
>>> > problem/misconfiguration.
>>> >
>>> > All the nodes can reach each other via ping and telnet on the
>>> corresponding
>>> > ports.
>>> >
>>> > Any hints what is going wrong there?
>>>
>>> I had this happen in a virtualbox configuration I was testing with,
>>> and never got it fixed, but suspected it was because I wasn't using
>>> FQDNs (as you're also not doing). I had found - can't find it again
>>> now, of course - a message in the archives from somebody also
>>> suffering from this issue while still using FQDNs, but claimed that
>>> they found an error in /etc/hosts.
>>>
>>> Good luck.
>>>
>>
>>
>

Re: Master registers itself at startup?

Reply via email to