Thank you so much Mich,

I am using yarn as my master.

I found a statement in Spark mentioning the amount of memory depends on 
individual application.
http://spark.apache.org/docs/1.5.2/hardware-provisioning.html#memory 
<http://spark.apache.org/docs/1.5.2/hardware-provisioning.html#memory>

I guess my assumption that "default resources (memory and cores) can handle any 
application" is wrong.

Thanks and regards,
Anand Viswanathan

> On Sep 19, 2016, at 6:56 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> If you make your driver memory too low it is likely you are going to hit OOM 
> error.
> 
> You have not mentioned with Spark mode you are using (Local, Standalone, Yarn 
> etc)
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 19 September 2016 at 23:48, Anand Viswanathan 
> <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote:
> Thank you so much, Kevin.
> 
> My data size is around 4GB.
> I am not using collect(), take() or takeSample()
> At the final job, number of tasks grows up to 200,000
> 
> Still the driver crashes with OOM with default —driver-memory 1g but Job 
> succeeds if i specify 2g.
> 
> Thanks and regards,
> Anand Viswanathan
> 
>> On Sep 19, 2016, at 4:00 PM, Kevin Mellott <kevin.r.mell...@gmail.com 
>> <mailto:kevin.r.mell...@gmail.com>> wrote:
>> 
>> Hi Anand,
>> 
>> Unfortunately, there is not really a "one size fits all" answer to this 
>> question; however, here are some things that you may want to consider when 
>> trying different sizes.
>> What is the size of the data you are processing?
>> Whenever you invoke an action that requires ALL of the data to be sent to 
>> the driver (such as collect), you'll need to ensure that your memory setting 
>> can handle it.
>> What level of parallelization does your code support? The more processing 
>> you can do on the worker nodes, the less your driver will need to do.
>> Related to these comments, keep in mind that the --executor-memory, 
>> --num-executors, and --executor-cores configurations can be useful when 
>> tuning the worker nodes. There is some great information in the Spark Tuning 
>> Guide (linked below) that you may find useful as well.
>> 
>> http://spark.apache.org/docs/latest/tuning.html 
>> <http://spark.apache.org/docs/latest/tuning.html>
>> 
>> Hope that helps!
>> Kevin
>> 
>> On Mon, Sep 19, 2016 at 9:32 AM, Anand Viswanathan 
>> <anand_v...@ymail.com.invalid <mailto:anand_v...@ymail.com.invalid>> wrote:
>> Hi,
>> 
>> Spark version :spark-1.5.2-bin-hadoop2.6 ,using pyspark. 
>> 
>> I am running a machine learning program, which runs perfectly by specifying 
>> 2G for —driver-memory.
>> However the program cannot be run with default 1G, driver crashes with OOM 
>> error.
>> 
>> What is the recommended configuration for —driver-memory…? Please suggest.
>> 
>> Thanks and regards,
>> Anand.
>> 
>> 
> 
> 

Reply via email to