@YouPeng, @Aaron many thanks for the memory-setting hint. That solved the issue, just increased it to the default val of 512MB
thanks, Gerd On 14 April 2014 03:22, YouPeng Yang <yypvsxf19870...@gmail.com> wrote: > Hi > > The 512MB is the default memory size which each executor needs. and > actually, your job does not need as much as the default memory size. you > can create a SparkContext with > sc = new SparkContext("local-cluster[2,1,512]", "test") // suppose you > use the local-cluster model. > Here the 512 is the memory size,you can change it. > > > 2014-04-14 7:22 GMT+08:00 Aaron Davidson <ilike...@gmail.com>: > > This is usually due to a memory misconfiguration somewhere. Your job may >> be requesting that each executor has 512MB, and your cluster may not be >> able to satisfy that (if you're only allowing 64MB executors, for >> instance). Try setting spark.executor.memory to be the same as >> SPARK_WORKER_MEMORY. >> >> >> On Sun, Apr 13, 2014 at 12:47 PM, Gerd Koenig < >> koenig.boden...@googlemail.com> wrote: >> >>> Many thanks for your explanation. >>> >>> So there's just my issue with that "TaskSchedulerImpl: Initial job has >>> not accepted any resources" stuff that prevents me from starting with Spark >>> (at least execute the examples successfully) ;) >>> >>> br, Gerd >>> >>> >>> On 13 April 2014 10:17, Aaron Davidson <ilike...@gmail.com> wrote: >>> >>>> By the way, 64 MB of RAM per machine is really small, I'm surprised >>>> Spark can even start up on that! Perhaps you meant to set >>>> SPARK_DAEMON_MEMORY so that the actual worker process itself would be >>>> small, but SPARK_WORKER_MEMORY (which controls the amount of memory >>>> available for Spark executors) should be at least 512 MB, and ideally many >>>> times that. >>>> >>>> >>>> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <ilike...@gmail.com>wrote: >>>> >>>>> This was actually a bug in the log message itself, where the Master >>>>> would print its own ip and port instead of the registered worker's. It has >>>>> been fixed in 0.9.1 and 1.0.0 (here's the patch: >>>>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85 >>>>> ). >>>>> >>>>> Sorry about the confusion! >>>>> >>>>> >>>>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <m...@coactus.com> wrote: >>>>> >>>>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > I'm wondering why the master is registering itself at startup, >>>>>> exactly 3 >>>>>> > times (same number as the number of workers). Log excerpt: >>>>>> > "" >>>>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: >>>>>> Slf4jLogger >>>>>> > started >>>>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting >>>>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening >>>>>> on >>>>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077] >>>>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master: >>>>>> Starting >>>>>> > Spark master at spark://hadoop-pg-5.cluster:7077 >>>>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server: >>>>>> > jetty-7.x.y-SNAPSHOT >>>>>> > 2014-04-11 21:08:16,341 INFO >>>>>> org.eclipse.jetty.server.AbstractConnector: >>>>>> > Started SelectChannelConnector@0.0.0.0:18080 >>>>>> > 2014-04-11 21:08:16,343 INFO >>>>>> org.apache.spark.deploy.master.ui.MasterWebUI: >>>>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080 >>>>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: >>>>>> I have >>>>>> > been elected leader! New state: ALIVE >>>>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master: >>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB >>>>>> RAM >>>>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master: >>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB >>>>>> RAM >>>>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master: >>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB >>>>>> RAM >>>>>> > "" >>>>>> > >>>>>> > The workers should be hadoop-pg-7/-8/-9 >>>>>> > This seems strange to me, or do I just interpret the log entry >>>>>> wrong ?!?! >>>>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl: >>>>>> Initial job >>>>>> > has not accepted any resources" and both issues are caused by the >>>>>> same >>>>>> > problem/misconfiguration. >>>>>> > >>>>>> > All the nodes can reach each other via ping and telnet on the >>>>>> corresponding >>>>>> > ports. >>>>>> > >>>>>> > Any hints what is going wrong there? >>>>>> >>>>>> I had this happen in a virtualbox configuration I was testing with, >>>>>> and never got it fixed, but suspected it was because I wasn't using >>>>>> FQDNs (as you're also not doing). I had found - can't find it again >>>>>> now, of course - a message in the archives from somebody also >>>>>> suffering from this issue while still using FQDNs, but claimed that >>>>>> they found an error in /etc/hosts. >>>>>> >>>>>> Good luck. >>>>>> >>>>> >>>>> >>>> >>> >> >