Please try and play with spark-defaults.conf for EMR. Dynamic allocation =
true is there by default for EMR 4.4 and above.
What is the EMR version you are using?

http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458

On Thu, Jan 19, 2017 at 5:02 PM, Venkata D <dvenkatj2ee...@gmail.com> wrote:

> blondowski,
>
> How big is your JSON file. Is it possible to post the spark params or
> configurations here, maybe that might get to some idea about the issue.
>
> Thanks
>
> On Thu, Jan 19, 2017 at 4:21 PM, blondowski <dan.blondow...@dice.com>
> wrote:
>
>> Please bear with me..I'm fairly new to spark.  Running pyspark 2.0.1 on
>> AWS
>> EMR (6 node cluster with 475GB of RAM)
>>
>> We have a job that creates a dataframe from json files, then does some
>> manipulation (adds columns) and then calls a UDF.
>>
>> The job fails on the UDF call with Container killed by YARN for exceeding
>> memory limits. 6.7 GB of 6.6 GB physical memory used. Consider boosting
>> spark.yarn.executor.memoryOverhead.
>>
>> I've tried adjusting executor-memory to 48GB, but that also failed.
>>
>> What I've noticed that during reading json & creation of dataframe it uses
>> 100+ executors and all of the memory on the cluster is being used.
>>
>> When it gets to the part where it's calling UDF it only allocates 3
>> executors. And they all die one by one.
>> Can somebody please explain to me how the executors get allocated?
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Executors-running-out-of-memory-tp28325.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


-- 
Regards,
Sanat Patnaik
Cell->804-882-6424

Reply via email to