Hi Anand,

Unfortunately, there is not really a "one size fits all" answer to this
question; however, here are some things that you may want to consider when
trying different sizes.

   - What is the size of the data you are processing?
   - Whenever you invoke an action that requires ALL of the data to be sent
   to the driver (such as collect), you'll need to ensure that your memory
   setting can handle it.
   - What level of parallelization does your code support? The more
   processing you can do on the worker nodes, the less your driver will need
   to do.

Related to these comments, keep in mind that the --executor-memory,
--num-executors, and --executor-cores configurations can be useful when
tuning the worker nodes. There is some great information in the Spark
Tuning Guide (linked below) that you may find useful as well.

http://spark.apache.org/docs/latest/tuning.html

Hope that helps!
Kevin

On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan <
anand_v...@ymail.com.invalid> wrote:

> Hi,
>
> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark.
>
> I am running a machine learning program, which runs perfectly by
> specifying 2G for —driver-memory.
> However the program cannot be run with default 1G, driver crashes with OOM
> error.
>
> What is the recommended configuration for —driver-memory…? Please suggest.
>
> Thanks and regards,
> Anand.
>
>

Reply via email to