Thank you so much Mich, I am using yarn as my master.
I found a statement in Spark mentioning the amount of memory depends on individual application. http://spark.apache.org/docs/1.5.2/hardware-provisioning.html#memory <http://spark.apache.org/docs/1.5.2/hardware-provisioning.html#memory> I guess my assumption that "default resources (memory and cores) can handle any application" is wrong. Thanks and regards, Anand Viswanathan > On Sep 19, 2016, at 6:56 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > If you make your driver memory too low it is likely you are going to hit OOM > error. > > You have not mentioned with Spark mode you are using (Local, Standalone, Yarn > etc) > > HTH > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 19 September 2016 at 23:48, Anand Viswanathan > <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote: > Thank you so much, Kevin. > > My data size is around 4GB. > I am not using collect(), take() or takeSample() > At the final job, number of tasks grows up to 200,000 > > Still the driver crashes with OOM with default —driver-memory 1g but Job > succeeds if i specify 2g. > > Thanks and regards, > Anand Viswanathan > >> On Sep 19, 2016, at 4:00 PM, Kevin Mellott <kevin.r.mell...@gmail.com >> <mailto:kevin.r.mell...@gmail.com>> wrote: >> >> Hi Anand, >> >> Unfortunately, there is not really a "one size fits all" answer to this >> question; however, here are some things that you may want to consider when >> trying different sizes. >> What is the size of the data you are processing? >> Whenever you invoke an action that requires ALL of the data to be sent to >> the driver (such as collect), you'll need to ensure that your memory setting >> can handle it. >> What level of parallelization does your code support? The more processing >> you can do on the worker nodes, the less your driver will need to do. >> Related to these comments, keep in mind that the --executor-memory, >> --num-executors, and --executor-cores configurations can be useful when >> tuning the worker nodes. There is some great information in the Spark Tuning >> Guide (linked below) that you may find useful as well. >> >> http://spark.apache.org/docs/latest/tuning.html >> <http://spark.apache.org/docs/latest/tuning.html> >> >> Hope that helps! >> Kevin >> >> On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan >> <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote: >> Hi, >> >> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. >> >> I am running a machine learning program, which runs perfectly by specifying >> 2G for —driver-memory. >> However the program cannot be run with default 1G, driver crashes with OOM >> error. >> >> What is the recommended configuration for —driver-memory…? Please suggest. >> >> Thanks and regards, >> Anand. >> >> > >