Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Aureliano Buendia Tue, 14 Jan 2014 11:03:48 -0800

On Tue, Jan 14, 2014 at 5:52 PM, Christopher Nguyen <[email protected]> wrote:


> Aureliano, this sort of jar-hell is something we have to deal with,
> whether Spark or elsewhere. How would you propose we fix this with Spark?
>
Do you mean that Spark's own scaffolding caused you to pull in both
> Protobuf 2.4 and 2.5?
>
I simply used the newer protobuf for higher efficiency. I had no idea this
could conflict with spark.

> Or do you mean the error message should have been more helpful?
>
That error is actually a warning, and the warning doesn't even know what
went wrong, it is asking the user to check the web ui for two unrelated
points: (1) that the workers are registered and (2) that there is enough
memory:

https://github.com/apache/incubator-spark/blob/fdaabdc67387524ffb84354f87985f48bd31cf60/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L150-L156

In my case, spark has no idea that hadoop is failing. I think there above
error checking is weak. If the workers are not registered, spark must
report so. More importantly, if there is not enough memory, spark must be
able to report exactly how much memory is potentially needed, and knowing
all about the allocated resources, it should even let the user know about
the memory shortage amount.

Another major problem is the settings mess in spark.

You can set spark.executor.memory property, or you could set SPARK_MEM env
variable.
After you set these, they are not bound to java heap size, so you need to
set these up too as spark-class
does<https://github.com/apache/incubator-spark/blob/a862cafacf555373b5fdbafb4c9c4d712b191648/bin/spark-class#L93>.
Then there is another parameter: SPARK_WORKER_MEMORY.

So the user has to fiddle around with many parameters to get rid of that
warning, but even with doing that, it is not clear if that set of
parameters is the optimal way of using the resources. Spark probably could
automate this as much as possible.

> Sent while mobile. Pls excuse typos etc.
> On Jan 14, 2014 9:27 AM, "Aureliano Buendia" <[email protected]> wrote:
>
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur <[email protected]
>> > wrote:
>>
>>> How much memory you are setting for exector JVM.
>>> This problem comes when either there is a communication problem between
>>> Master/Worker. or you do not have any memory left. Eg, you specified 75G
>>> for your executor and your machine has a memory of 70G.
>>>
>>
>> This was not a memory problem. This could be considered a spark bug.
>>
>> Here is what happened: My app was using protobuf 2.5, while spark has a
>> protobuf 2.4 dependency, and classpath was like this:
>>
>> my_app.jar:spark_assembly.jar:..
>>
>> This caused spark, (or a dependency, probably hadoop) to use protobuf
>> 2.5, giving that misleading 'ensure that workers are registered and have
>> sufficient memory' error.
>>
>> Regenerating this error is easy, just download protobuf 2.5 and put it at
>> the beginning of your classpath for any app, you should get that error.
>>
>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia <[email protected]
>>> > wrote:
>>>
>>>> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
>>>> values.
>>>>
>>>> There are many issues regarding the Initial job has not accepted any
>>>> resources... error though:
>>>>
>>>>    - When I put my assembly jar 
>>>> *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
>>>> happens.
>>>>    Moving my jar after the spark-assembly it works fine.
>>>>    In my case, I need to put my jar before spark-assembly, as my jar
>>>>    uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>>>>    - Sometimes when this error happens the whole cluster server must
>>>>    be restarted, or even run-example script wouldn't work. It took me a 
>>>> while
>>>>    to find this out, making debugging very time consuming.
>>>>    - The error message is absolutely irrelevant.
>>>>
>>>> I guess the problem should be somewhere with the spark context jar
>>>> delivery part.
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia <[email protected]
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Just follow the docs at
>>>>>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
>>>>>>  how to run an application. Spark is designed so that you can simply run
>>>>>> your application *without* any scripts whatsoever, and submit your JAR to
>>>>>> the SparkContext constructor, which will distribute it. You can launch 
>>>>>> your
>>>>>> application with “scala”, “java”, or whatever tool you’d prefer.
>>>>>>
>>>>>
>>>>> I'm afraid what you said about 'simply run your application *without*
>>>>> any scripts whatsoever' does not apply to spark at the moment, and it
>>>>> simply does not work.
>>>>>
>>>>> Try the simple Pi calculation this on a standard spark-ec2 instance:
>>>>>
>>>>> java -cp
>>>>> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
>>>>> org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`
>>>>>
>>>>> And you'll get the error:
>>>>>
>>>>> WARN cluster.ClusterScheduler: Initial job has not accepted any
>>>>> resources; check your cluster UI to ensure that workers are registered and
>>>>> have sufficient memory
>>>>>
>>>>> While the script way works:
>>>>>
>>>>> spark/run-example org.apache.spark.examples.SparkPi `cat
>>>>> spark-ec2/cluster-url`
>>>>>
>>>>> What am I missing in the above java command?
>>>>>
>>>>>
>>>>>>
>>>>>> Matei
>>>>>>
>>>>>> On Jan 8, 2014, at 8:26 PM, Aureliano Buendia <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Oh, you shouldn’t use spark-class for your own classes. Just build
>>>>>>> your job separately and submit it by running it with “java” and 
>>>>>>> creating a
>>>>>>> SparkContext in it. spark-class is designed to run classes internal to 
>>>>>>> the
>>>>>>> Spark project.
>>>>>>>
>>>>>>
>>>>>> Really? Apparently Eugen runs his jobs by:
>>>>>>
>>>>>> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar 
>>>>>> com.myproject.MyJob
>>>>>>
>>>>>> , as he instructed me 
>>>>>> here<http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/browser>to
>>>>>>  do this.
>>>>>>
>>>>>> I have to say while spark documentation is not sparse, it does not
>>>>>> address enough, and as you can see the community is confused.
>>>>>>
>>>>>> Are the spark users supposed to create something like run-example for
>>>>>> their own jobs?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Matei
>>>>>>>
>>>>>>> On Jan 8, 2014, at 8:06 PM, Aureliano Buendia <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 3:59 AM, Matei Zaharia <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Have you looked at the cluster UI, and do you see any workers
>>>>>>>> registered there, and your application under running applications? 
>>>>>>>> Maybe
>>>>>>>> you typed in the wrong master URL or something like that.
>>>>>>>>
>>>>>>>
>>>>>>> No, it's automated: cat spark-ec2/cluster-url
>>>>>>>
>>>>>>> I think the problem might be caused by spark-class script. It seems
>>>>>>> to assign too much memory.
>>>>>>>
>>>>>>> I forgot the fact that run-example doesn't use spark-class.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Jan 8, 2014, at 7:07 PM, Aureliano Buendia <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> The strange thing is that spark examples work fine, but when I
>>>>>>>> include a spark example in my jar and deploy it, I get this error for 
>>>>>>>> the
>>>>>>>> very same example:
>>>>>>>>
>>>>>>>> WARN ClusterScheduler: Initial job has not accepted any resources;
>>>>>>>> check your cluster UI to ensure that workers are registered and have
>>>>>>>> sufficient memory
>>>>>>>>
>>>>>>>> My jar is deployed to master and then to workers by
>>>>>>>> spark-ec2/copy-dir. Why would including the example in my jar cause 
>>>>>>>> this
>>>>>>>> error?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 9, 2014 at 12:41 AM, Aureliano Buendia <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Could someone explain how SPARK_MEM, SPARK_WORKER_MEMORY and
>>>>>>>>> spark.executor.memory should be related so that this non helpful error
>>>>>>>>> doesn't occur?
>>>>>>>>>
>>>>>>>>> Maybe there are more env and java config variable about memory
>>>>>>>>> that I'm missing.
>>>>>>>>>
>>>>>>>>> By the way, that bit of the error asking to check the web UI, it's
>>>>>>>>> just redundant. The UI is of no help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 8, 2014 at 4:31 PM, Aureliano Buendia <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My spark cluster is not able to run a job due to this warning:
>>>>>>>>>>
>>>>>>>>>> WARN ClusterScheduler: Initial job has not accepted any
>>>>>>>>>> resources; check your cluster UI to ensure that workers are 
>>>>>>>>>> registered and
>>>>>>>>>> have sufficient memory
>>>>>>>>>>
>>>>>>>>>> The workers have these status:
>>>>>>>>>>
>>>>>>>>>> ALIVE 2 (0 Used)6.3 GB (0.0 B Used) So there must be plenty of
>>>>>>>>>> memory available despite the warning message. I'm using default spark
>>>>>>>>>> config, is there a config parameter that needs changing for this to 
>>>>>>>>>> work?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Reply via email to