Thank you so much, Kevin.
My data size is around 4GB.
I am not using collect(), take() or takeSample()
At the final job, number of tasks grows up to 200,000
Still the driver crashes with OOM with default —driver-memory 1g but Job
succeeds if i specify 2g.
Thanks and regards,
> On Sep 19, 2016, at 4:00 PM, Kevin Mellott <kevin.r.mell...@gmail.com> wrote:
> Hi Anand,
> Unfortunately, there is not really a "one size fits all" answer to this
> question; however, here are some things that you may want to consider when
> trying different sizes.
> What is the size of the data you are processing?
> Whenever you invoke an action that requires ALL of the data to be sent to the
> driver (such as collect), you'll need to ensure that your memory setting can
> handle it.
> What level of parallelization does your code support? The more processing you
> can do on the worker nodes, the less your driver will need to do.
> Related to these comments, keep in mind that the --executor-memory,
> --num-executors, and --executor-cores configurations can be useful when
> tuning the worker nodes. There is some great information in the Spark Tuning
> Guide (linked below) that you may find useful as well.
> Hope that helps!
> On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan
> <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote:
> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark.
> I am running a machine learning program, which runs perfectly by specifying
> 2G for —driver-memory.
> However the program cannot be run with default 1G, driver crashes with OOM
> What is the recommended configuration for —driver-memory…? Please suggest.
> Thanks and regards,