Thank you so much, Kevin. My data size is around 4GB. I am not using collect(), take() or takeSample() At the final job, number of tasks grows up to 200,000
Still the driver crashes with OOM with default —driver-memory 1g but Job succeeds if i specify 2g. Thanks and regards, Anand Viswanathan > On Sep 19, 2016, at 4:00 PM, Kevin Mellott <kevin.r.mell...@gmail.com> wrote: > > Hi Anand, > > Unfortunately, there is not really a "one size fits all" answer to this > question; however, here are some things that you may want to consider when > trying different sizes. > What is the size of the data you are processing? > Whenever you invoke an action that requires ALL of the data to be sent to the > driver (such as collect), you'll need to ensure that your memory setting can > handle it. > What level of parallelization does your code support? The more processing you > can do on the worker nodes, the less your driver will need to do. > Related to these comments, keep in mind that the --executor-memory, > --num-executors, and --executor-cores configurations can be useful when > tuning the worker nodes. There is some great information in the Spark Tuning > Guide (linked below) that you may find useful as well. > > http://spark.apache.org/docs/latest/tuning.html > <http://spark.apache.org/docs/latest/tuning.html> > > Hope that helps! > Kevin > > On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan > <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote: > Hi, > > Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. > > I am running a machine learning program, which runs perfectly by specifying > 2G for —driver-memory. > However the program cannot be run with default 1G, driver crashes with OOM > error. > > What is the recommended configuration for —driver-memory…? Please suggest. > > Thanks and regards, > Anand. > >