@YouPeng, @Aaron

many thanks for the memory-setting hint.
That solved the issue, just increased it to the default val of 512MB

thanks, Gerd


On 14 April 2014 03:22, YouPeng Yang <yypvsxf19870...@gmail.com> wrote:

> Hi
>
> The 512MB is the default memory size which each executor needs. and
> actually, your job does not need as much as the default memory size. you
> can create a  SparkContext with
>  sc = new SparkContext("local-cluster[2,1,512]", "test") // suppose you
> use the local-cluster model.
> Here the 512 is the memory size,you can change it.
>
>
> 2014-04-14 7:22 GMT+08:00 Aaron Davidson <ilike...@gmail.com>:
>
> This is usually due to a memory misconfiguration somewhere. Your job may
>> be requesting that each executor has 512MB, and your cluster may not be
>> able to satisfy that (if you're only allowing 64MB executors, for
>> instance). Try setting spark.executor.memory to be the same as
>> SPARK_WORKER_MEMORY.
>>
>>
>> On Sun, Apr 13, 2014 at 12:47 PM, Gerd Koenig <
>> koenig.boden...@googlemail.com> wrote:
>>
>>> Many thanks for your explanation.
>>>
>>> So there's just my issue with that "TaskSchedulerImpl: Initial job has
>>> not accepted any resources" stuff that prevents me from starting with Spark
>>> (at least execute the examples successfully) ;)
>>>
>>> br, Gerd
>>>
>>>
>>> On 13 April 2014 10:17, Aaron Davidson <ilike...@gmail.com> wrote:
>>>
>>>> By the way, 64 MB of RAM per machine is really small, I'm surprised
>>>> Spark can even start up on that! Perhaps you meant to set
>>>> SPARK_DAEMON_MEMORY so that the actual worker process itself would be
>>>> small, but SPARK_WORKER_MEMORY (which controls the amount of memory
>>>> available for Spark executors) should be at least 512 MB, and ideally many
>>>> times that.
>>>>
>>>>
>>>> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <ilike...@gmail.com>wrote:
>>>>
>>>>> This was actually a bug in the log message itself, where the Master
>>>>> would print its own ip and port instead of the registered worker's. It has
>>>>> been fixed in 0.9.1 and 1.0.0 (here's the patch:
>>>>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>>>>> ).
>>>>>
>>>>> Sorry about the confusion!
>>>>>
>>>>>
>>>>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <m...@coactus.com> wrote:
>>>>>
>>>>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <koenig....@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm wondering why the master is registering itself at startup,
>>>>>> exactly 3
>>>>>> > times (same number as the number of workers). Log excerpt:
>>>>>> > ""
>>>>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger:
>>>>>> Slf4jLogger
>>>>>> > started
>>>>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>>>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening
>>>>>> on
>>>>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>>>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>>>>> Starting
>>>>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>>>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>>>>> > jetty-7.x.y-SNAPSHOT
>>>>>> > 2014-04-11 21:08:16,341 INFO
>>>>>> org.eclipse.jetty.server.AbstractConnector:
>>>>>> > Started SelectChannelConnector@0.0.0.0:18080
>>>>>> > 2014-04-11 21:08:16,343 INFO
>>>>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>>>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>>>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master:
>>>>>> I have
>>>>>> > been elected leader! New state: ALIVE
>>>>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > ""
>>>>>> >
>>>>>> > The workers should be hadoop-pg-7/-8/-9
>>>>>> > This seems strange to me, or do I just interpret the log entry
>>>>>> wrong ?!?!
>>>>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>>>>> Initial job
>>>>>> > has not accepted any resources" and both issues are caused by the
>>>>>> same
>>>>>> > problem/misconfiguration.
>>>>>> >
>>>>>> > All the nodes can reach each other via ping and telnet on the
>>>>>> corresponding
>>>>>> > ports.
>>>>>> >
>>>>>> > Any hints what is going wrong there?
>>>>>>
>>>>>> I had this happen in a virtualbox configuration I was testing with,
>>>>>> and never got it fixed, but suspected it was because I wasn't using
>>>>>> FQDNs (as you're also not doing). I had found - can't find it again
>>>>>> now, of course - a message in the archives from somebody also
>>>>>> suffering from this issue while still using FQDNs, but claimed that
>>>>>> they found an error in /etc/hosts.
>>>>>>
>>>>>> Good luck.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to