Hi Anand, Unfortunately, there is not really a "one size fits all" answer to this question; however, here are some things that you may want to consider when trying different sizes.
- What is the size of the data you are processing? - Whenever you invoke an action that requires ALL of the data to be sent to the driver (such as collect), you'll need to ensure that your memory setting can handle it. - What level of parallelization does your code support? The more processing you can do on the worker nodes, the less your driver will need to do. Related to these comments, keep in mind that the --executor-memory, --num-executors, and --executor-cores configurations can be useful when tuning the worker nodes. There is some great information in the Spark Tuning Guide (linked below) that you may find useful as well. http://spark.apache.org/docs/latest/tuning.html Hope that helps! Kevin On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan < anand_v...@ymail.com.invalid> wrote: > Hi, > > Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. > > I am running a machine learning program, which runs perfectly by > specifying 2G for —driver-memory. > However the program cannot be run with default 1G, driver crashes with OOM > error. > > What is the recommended configuration for —driver-memory…? Please suggest. > > Thanks and regards, > Anand. > >