Hi Josh,
As you say, I also recognize the problem. I feel I got a warning when specifying a huge data set. We also adjust the partition size but we are doing command options instead of default settings, or in code. Regards, Takashi 2017-07-18 6:48 GMT+09:00 Josh Holbrook <josh.holbr...@fusion.net>: > I just ran into this issue! Small world. > > As far as I can tell, by default spark on EMR is completely untuned, but it > comes with a flag that you can set to tell EMR to autotune spark. In your > configuration.json file, you can add something like: > > { > "Classification": "spark", > "Properties": { > "maximizeResourceAllocation": "true" > } > }, > > but keep in mind that, again as far as I can tell, the default parallelism > with this config is merely twice the number of executor cores--so for a 10 > machine cluster w/ 3 active cores each, 60 partitions. This is pretty low, > so you'll likely want to adjust this--I'm currently using the following > because spark chokes on datasets that are bigger than about 2g per > partition: > > { > "Classification": "spark-defaults", > "Properties": { > "spark.default.parallelism": "1000" > } > } > > Good luck, and I hope this is helpful! > > --Josh > > > On Mon, Jul 17, 2017 at 4:59 PM, Takashi Sasaki <tsasaki...@gmail.com> > wrote: >> >> Hi Pascal, >> >> The error also occurred frequently in our project. >> >> As a solution, it was effective to specify the memory size directly >> with spark-submit command. >> >> eg. spark-submit executor-memory 2g >> >> >> Regards, >> >> Takashi >> >> > 2017-07-18 5:18 GMT+09:00 Pascal Stammer <stam...@deichbrise.de>: >> >> Hi, >> >> >> >> I am running a Spark 2.1.x Application on AWS EMR with YARN and get >> >> following error that kill my application: >> >> >> >> AM Container for appattempt_1500320286695_0001_000001 exited with >> >> exitCode: >> >> -104 >> >> For more detailed output, check application tracking >> >> >> >> page:http://ip-172-31-35-192.eu-central-1.compute.internal:8088/cluster/app/application_1500320286695_0001Then, >> >> click on links to logs of each attempt. >> >> Diagnostics: Container >> >> [pid=9216,containerID=container_1500320286695_0001_01_000001] is >> >> running >> >> beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical >> >> memory used; 3.3 GB of 6.9 GB virtual memory used. Killing container. >> >> >> >> >> >> I already change spark.yarn.executor.memoryOverhead but the error still >> >> occurs. Does anybody have a hint for me which parameter or >> >> configuration I >> >> have to adapt. >> >> >> >> Thank you very much. >> >> >> >> Regards, >> >> >> >> Pascal Stammer >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org