Unfortunately, there is not really a "one size fits all" answer to this
question; however, here are some things that you may want to consider when
trying different sizes.
- What is the size of the data you are processing?
- Whenever you invoke an action that requires ALL of the data to be sent
to the driver (such as collect), you'll need to ensure that your memory
setting can handle it.
- What level of parallelization does your code support? The more
processing you can do on the worker nodes, the less your driver will need
Related to these comments, keep in mind that the --executor-memory,
--num-executors, and --executor-cores configurations can be useful when
tuning the worker nodes. There is some great information in the Spark
Tuning Guide (linked below) that you may find useful as well.
Hope that helps!
On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan <
> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark.
> I am running a machine learning program, which runs perfectly by
> specifying 2G for —driver-memory.
> However the program cannot be run with default 1G, driver crashes with OOM
> What is the recommended configuration for —driver-memory…? Please suggest.
> Thanks and regards,