Thank you so much, Kevin.

My data size is around 4GB.
I am not using collect(), take() or takeSample()
At the final job, number of tasks grows up to 200,000

Still the driver crashes with OOM with default —driver-memory 1g but Job 
succeeds if i specify 2g.

Thanks and regards,
Anand Viswanathan

> On Sep 19, 2016, at 4:00 PM, Kevin Mellott <> wrote:
> Hi Anand,
> Unfortunately, there is not really a "one size fits all" answer to this 
> question; however, here are some things that you may want to consider when 
> trying different sizes.
> What is the size of the data you are processing?
> Whenever you invoke an action that requires ALL of the data to be sent to the 
> driver (such as collect), you'll need to ensure that your memory setting can 
> handle it.
> What level of parallelization does your code support? The more processing you 
> can do on the worker nodes, the less your driver will need to do.
> Related to these comments, keep in mind that the --executor-memory, 
> --num-executors, and --executor-cores configurations can be useful when 
> tuning the worker nodes. There is some great information in the Spark Tuning 
> Guide (linked below) that you may find useful as well.
> <>
> Hope that helps!
> Kevin
> On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan 
> < <>> wrote:
> Hi,
> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. 
> I am running a machine learning program, which runs perfectly by specifying 
> 2G for —driver-memory.
> However the program cannot be run with default 1G, driver crashes with OOM 
> error.
> What is the recommended configuration for —driver-memory…? Please suggest.
> Thanks and regards,
> Anand.

Reply via email to