Re: pyspark/yarn and inconsistent number of executors

2014-08-22 Thread Sandy Ryza
Hi Calvin,

When you say until all the memory in the cluster is allocated and the job
gets killed, do you know what's going on?  Spark apps should never be
killed for requesting / using too many resources?  Any associated error
message?

Unfortunately there are no tools currently for tweaking the number of
executors in an automated manner.  An option to use the entire YARN cluster
seems useful. I just filed a JIRA for it -
https://issues.apache.org/jira/browse/SPARK-3183.

-Sandy


On Tue, Aug 19, 2014 at 12:51 PM, Calvin iphcal...@gmail.com wrote:

 I've set up a YARN (Hadoop 2.4.1) cluster with Spark 1.0.1 and I've
 been seeing some inconsistencies with out of memory errors
 (java.lang.OutOfMemoryError: unable to create new native thread) when
 increasing the number of executors for a simple job (wordcount).

 The general format of my submission is:

 spark-submit \
  --master yarn-client \
  --num-executors=$EXECUTORS \
  --executor-cores 1 \
  --executor-memory 2G \
  --driver-memory 3G \
  count.py intput output

 If I run without specifying the number of executors, it defaults to
 two (3 containers: 2 executors, 1 driver). Is there any mechanism to
 let a spark application scale to the capacity of the YARN cluster
 automatically?

 Similarly, for low numbers of executors I get what I asked for (e.g.,
 10 executors results in 11 containers running, 20 executors results in
 21 containers, etc) until a particular threshold... when I specify 50
 containers, Spark seems to start asking for more and more containers
 until all the memory in the cluster is allocated and the job gets
 killed.

 I don't understand that particular behavior—if anyone has any
 thoughts, that would be great if you could share your experiences.

 Wouldn't it be preferable to have Spark stop requesting containers if
 the cluster is at capacity rather than kill the job or error out?

 Does anyone have any recommendations on how to tweak the number of
 executors in an automated manner?

 Thanks,
 Calvin

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




pyspark/yarn and inconsistent number of executors

2014-08-19 Thread Calvin
I've set up a YARN (Hadoop 2.4.1) cluster with Spark 1.0.1 and I've
been seeing some inconsistencies with out of memory errors
(java.lang.OutOfMemoryError: unable to create new native thread) when
increasing the number of executors for a simple job (wordcount).

The general format of my submission is:

spark-submit \
 --master yarn-client \
 --num-executors=$EXECUTORS \
 --executor-cores 1 \
 --executor-memory 2G \
 --driver-memory 3G \
 count.py intput output

If I run without specifying the number of executors, it defaults to
two (3 containers: 2 executors, 1 driver). Is there any mechanism to
let a spark application scale to the capacity of the YARN cluster
automatically?

Similarly, for low numbers of executors I get what I asked for (e.g.,
10 executors results in 11 containers running, 20 executors results in
21 containers, etc) until a particular threshold... when I specify 50
containers, Spark seems to start asking for more and more containers
until all the memory in the cluster is allocated and the job gets
killed.

I don't understand that particular behavior—if anyone has any
thoughts, that would be great if you could share your experiences.

Wouldn't it be preferable to have Spark stop requesting containers if
the cluster is at capacity rather than kill the job or error out?

Does anyone have any recommendations on how to tweak the number of
executors in an automated manner?

Thanks,
Calvin

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org