Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Aureliano Buendia Tue, 14 Jan 2014 09:29:41 -0800

On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur <[email protected]>wrote:


> How much memory you are setting for exector JVM.
> This problem comes when either there is a communication problem between
> Master/Worker. or you do not have any memory left. Eg, you specified 75G
> for your executor and your machine has a memory of 70G.
>

This was not a memory problem. This could be considered a spark bug.

Here is what happened: My app was using protobuf 2.5, while spark has a
protobuf 2.4 dependency, and classpath was like this:

my_app.jar:spark_assembly.jar:..

This caused spark, (or a dependency, probably hadoop) to use protobuf 2.5,
giving that misleading 'ensure that workers are registered and have
sufficient memory' error.

Regenerating this error is easy, just download protobuf 2.5 and put it at
the beginning of your classpath for any app, you should get that error.


>
>
> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia 
> <[email protected]>wrote:
>
>> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
>> values.
>>
>> There are many issues regarding the Initial job has not accepted any
>> resources... error though:
>>
>>    - When I put my assembly jar 
>> *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
>> happens.
>>    Moving my jar after the spark-assembly it works fine.
>>    In my case, I need to put my jar before spark-assembly, as my jar
>>    uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>>    - Sometimes when this error happens the whole cluster server must be
>>    restarted, or even run-example script wouldn't work. It took me a while to
>>    find this out, making debugging very time consuming.
>>    - The error message is absolutely irrelevant.
>>
>> I guess the problem should be somewhere with the spark context jar
>> delivery part.
>>
>>
>> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia 
>> <[email protected]>wrote:
>>
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia 
>>> <[email protected]>wrote:
>>>
>>>> Just follow the docs at
>>>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
>>>>  how to run an application. Spark is designed so that you can simply run
>>>> your application *without* any scripts whatsoever, and submit your JAR to
>>>> the SparkContext constructor, which will distribute it. You can launch your
>>>> application with “scala”, “java”, or whatever tool you’d prefer.
>>>>
>>>
>>> I'm afraid what you said about 'simply run your application *without*
>>> any scripts whatsoever' does not apply to spark at the moment, and it
>>> simply does not work.
>>>
>>> Try the simple Pi calculation this on a standard spark-ec2 instance:
>>>
>>> java -cp
>>> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
>>> org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`
>>>
>>> And you'll get the error:
>>>
>>> WARN cluster.ClusterScheduler: Initial job has not accepted any
>>> resources; check your cluster UI to ensure that workers are registered and
>>> have sufficient memory
>>>
>>> While the script way works:
>>>
>>> spark/run-example org.apache.spark.examples.SparkPi `cat
>>> spark-ec2/cluster-url`
>>>
>>> What am I missing in the above java command?
>>>
>>>
>>>>
>>>> Matei
>>>>
>>>> On Jan 8, 2014, at 8:26 PM, Aureliano Buendia <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia 
>>>> <[email protected]>wrote:
>>>>
>>>>> Oh, you shouldn’t use spark-class for your own classes. Just build
>>>>> your job separately and submit it by running it with “java” and creating a
>>>>> SparkContext in it. spark-class is designed to run classes internal to the
>>>>> Spark project.
>>>>>
>>>>
>>>> Really? Apparently Eugen runs his jobs by:
>>>>
>>>> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
>>>>
>>>> , as he instructed me 
>>>> here<http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/browser>to
>>>>  do this.
>>>>
>>>> I have to say while spark documentation is not sparse, it does not
>>>> address enough, and as you can see the community is confused.
>>>>
>>>> Are the spark users supposed to create something like run-example for
>>>> their own jobs?
>>>>
>>>>
>>>>>
>>>>> Matei
>>>>>
>>>>> On Jan 8, 2014, at 8:06 PM, Aureliano Buendia <[email protected]>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 3:59 AM, Matei Zaharia <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Have you looked at the cluster UI, and do you see any workers
>>>>>> registered there, and your application under running applications? Maybe
>>>>>> you typed in the wrong master URL or something like that.
>>>>>>
>>>>>
>>>>> No, it's automated: cat spark-ec2/cluster-url
>>>>>
>>>>> I think the problem might be caused by spark-class script. It seems to
>>>>> assign too much memory.
>>>>>
>>>>> I forgot the fact that run-example doesn't use spark-class.
>>>>>
>>>>>
>>>>>>
>>>>>> Matei
>>>>>>
>>>>>> On Jan 8, 2014, at 7:07 PM, Aureliano Buendia <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> The strange thing is that spark examples work fine, but when I
>>>>>> include a spark example in my jar and deploy it, I get this error for the
>>>>>> very same example:
>>>>>>
>>>>>> WARN ClusterScheduler: Initial job has not accepted any resources;
>>>>>> check your cluster UI to ensure that workers are registered and have
>>>>>> sufficient memory
>>>>>>
>>>>>> My jar is deployed to master and then to workers by
>>>>>> spark-ec2/copy-dir. Why would including the example in my jar cause this
>>>>>> error?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 12:41 AM, Aureliano Buendia <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Could someone explain how SPARK_MEM, SPARK_WORKER_MEMORY and
>>>>>>> spark.executor.memory should be related so that this non helpful error
>>>>>>> doesn't occur?
>>>>>>>
>>>>>>> Maybe there are more env and java config variable about memory that
>>>>>>> I'm missing.
>>>>>>>
>>>>>>> By the way, that bit of the error asking to check the web UI, it's
>>>>>>> just redundant. The UI is of no help.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 4:31 PM, Aureliano Buendia <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> My spark cluster is not able to run a job due to this warning:
>>>>>>>>
>>>>>>>> WARN ClusterScheduler: Initial job has not accepted any resources;
>>>>>>>> check your cluster UI to ensure that workers are registered and have
>>>>>>>> sufficient memory
>>>>>>>>
>>>>>>>> The workers have these status:
>>>>>>>>
>>>>>>>> ALIVE 2 (0 Used)6.3 GB (0.0 B Used) So there must be plenty of
>>>>>>>> memory available despite the warning message. I'm using default spark
>>>>>>>> config, is there a config parameter that needs changing for this to 
>>>>>>>> work?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Reply via email to