Hi Deepak,

I'm going to shamelessly plug my blog post on tuning Spark:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

It talks about tuning executor size as well as how the number of tasks for
a stage is calculated.

-Sandy

On Thu, Apr 9, 2015 at 9:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:

> I have a spark job that has multiple stages. For now i star it with 100
> executors, each with 12G mem (max is 16G). I am using Spark 1.3 over YARN
> 2.4.x.
>
> For now i start the Spark Job with a very limited input (1 file of size
> 2G), overall there are 200 files. My first run is yet to complete as its
> taking too much of time / throwing OOM exceptions / buffer exceptions (keep
> that aside).
>
> How will i know how much resources are required to run this job. (# of
> cores, executors, mem, serialization buffers, and i do not yet what else).
>
> IN M/R world, all i do is set split size and rest is taken care
> automatically (yes i need to worry about mem, in case of OOM).
>
>
> 1) Can someone explain how they do resource estimation before running the
> job or is there no way and one needs to only try it out ?
> 2) Even if i give 100 executors, the first stage takes only 5, how did
> spark decide this ?
>
> Please point me to any resources that also talks about similar things or
> please explain here.
>
> --
> Deepak
>
>

Reply via email to