Potentially, with joins, you run out of memory on a single executor,
because a small skew in your data is being amplified. You could try to
increase the default number of partitions, reduce the number of
simultaneous tasks in execution (executor.num.cores), or add a
repartitioning operation before/after the join.
To debug, you could try reducing the number of executors available, so you
can more easily see which job/stage ends up going (b)oom.

On Fri, Apr 14, 2017 at 12:05 AM, Chen, Mingrui <mingr...@mail.smu.edu>
wrote:

> 1.5TB is incredible high. It doesn't seem to be a configuration problem.
> Could you paste the code snippet doing the loop and join task on the
> dataset?
>
>
> Best regards,
>
> ------------------------------
> *From:* rachmaninovquartet <rachmaninovquar...@gmail.com>
> *Sent:* Thursday, April 13, 2017 10:08:40 AM
> *To:* user@spark.apache.org
> *Subject:* Yarn containers getting killed, error 52, multiple joins
>
> Hi,
>
> I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is
> requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory
> usage seems to jump, when I loop through and inner join to make the dataset
> 12 times as wide. The app goes down during or after this loop, when I try
> to
> run a logistic regression on the generated dataframe. I'm using the scala
> API (2.10). Dynamic resource allocation is configured. Here are the
> parameters I'm using.
>
> --master yarn-client --queue analyst --executor-cor    es 5
> --executor-memory 40G --driver-memory 30G --conf spark.memory.fraction=0.75
> --conf spark.yarn.executor.memoryOverhead=5120
>
> Has anyone seen this or have an idea how to tune it? There is no way it
> should need so much memory.
>
> Thanks,
>
> Ian
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Yarn-containers-getting-killed-
> error-52-multiple-joins-tp28594.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to