Many thanks for your explanation. So there's just my issue with that "TaskSchedulerImpl: Initial job has not accepted any resources" stuff that prevents me from starting with Spark (at least execute the examples successfully) ;)
br, Gerd On 13 April 2014 10:17, Aaron Davidson <ilike...@gmail.com> wrote: > By the way, 64 MB of RAM per machine is really small, I'm surprised Spark > can even start up on that! Perhaps you meant to set SPARK_DAEMON_MEMORY so > that the actual worker process itself would be small, but > SPARK_WORKER_MEMORY (which controls the amount of memory available for > Spark executors) should be at least 512 MB, and ideally many times that. > > > On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <ilike...@gmail.com>wrote: > >> This was actually a bug in the log message itself, where the Master would >> print its own ip and port instead of the registered worker's. It has been >> fixed in 0.9.1 and 1.0.0 (here's the patch: >> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85 >> ). >> >> Sorry about the confusion! >> >> >> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <m...@coactus.com> wrote: >> >>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote: >>> > Hi, >>> > >>> > I'm wondering why the master is registering itself at startup, exactly >>> 3 >>> > times (same number as the number of workers). Log excerpt: >>> > "" >>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger >>> > started >>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting >>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on >>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] >>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master: >>> Starting >>> > Spark master at spark://hadoop-pg-5.cluster:7077 >>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server: >>> > jetty-7.x.y-SNAPSHOT >>> > 2014-04-11 21:08:16,341 INFO >>> org.eclipse.jetty.server.AbstractConnector: >>> > Started SelectChannelConnector@0.0.0.0:18080 >>> > 2014-04-11 21:08:16,343 INFO >>> org.apache.spark.deploy.master.ui.MasterWebUI: >>> > Started Master web UI at http://hadoop-pg-5.cluster:18080 >>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I >>> have >>> > been elected leader! New state: ALIVE >>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master: >>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM >>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master: >>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM >>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master: >>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM >>> > "" >>> > >>> > The workers should be hadoop-pg-7/-8/-9 >>> > This seems strange to me, or do I just interpret the log entry wrong >>> ?!?! >>> > Perhaps this relates to my other post titled "TaskSchedulerImpl: >>> Initial job >>> > has not accepted any resources" and both issues are caused by the same >>> > problem/misconfiguration. >>> > >>> > All the nodes can reach each other via ping and telnet on the >>> corresponding >>> > ports. >>> > >>> > Any hints what is going wrong there? >>> >>> I had this happen in a virtualbox configuration I was testing with, >>> and never got it fixed, but suspected it was because I wasn't using >>> FQDNs (as you're also not doing). I had found - can't find it again >>> now, of course - a message in the archives from somebody also >>> suffering from this issue while still using FQDNs, but claimed that >>> they found an error in /etc/hosts. >>> >>> Good luck. >>> >> >> >