Hi Deepak, I'm going to shamelessly plug my blog post on tuning Spark: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
It talks about tuning executor size as well as how the number of tasks for a stage is calculated. -Sandy On Thu, Apr 9, 2015 at 9:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > I have a spark job that has multiple stages. For now i star it with 100 > executors, each with 12G mem (max is 16G). I am using Spark 1.3 over YARN > 2.4.x. > > For now i start the Spark Job with a very limited input (1 file of size > 2G), overall there are 200 files. My first run is yet to complete as its > taking too much of time / throwing OOM exceptions / buffer exceptions (keep > that aside). > > How will i know how much resources are required to run this job. (# of > cores, executors, mem, serialization buffers, and i do not yet what else). > > IN M/R world, all i do is set split size and rest is taken care > automatically (yes i need to worry about mem, in case of OOM). > > > 1) Can someone explain how they do resource estimation before running the > job or is there no way and one needs to only try it out ? > 2) Even if i give 100 executors, the first stage takes only 5, how did > spark decide this ? > > Please point me to any resources that also talks about similar things or > please explain here. > > -- > Deepak > >