Hi Xiaochuan,

The most likely cause of the "Lost container" issue is that YARN is killing
container for exceeding memory limits.  If this is the case, you should be
able to find instances of "exceeding memory limits" in the application
logs.

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
has a more detailed explanation of why this happens.

-Sandy

On Sat, Oct 31, 2015 at 4:29 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> Maybe Hortonworks support can help you much better.
>
> Otherwise you may want to change the yarn scheduler configuration and
> preemption. Do you use something like speculative execution?
>
> How do you start execution of the programs? Maybe you are already using
> all cores of the master...
>
> On 30 Oct 2015, at 23:32, YI, XIAOCHUAN <xy1...@att.com> wrote:
>
> Hi
>
> Our team has a 40 node hortonworks Hadoop cluster 2.2.4.2-2  (36 data
> node) with apache spark 1.2 and 1.4 installed.
>
> Each node has 64G RAM and 8 cores.
>
>
>
> We are only able to use <= 72 executors with executor-cores=2
>
> So we are only get 144 active tasks running pyspark programs with pyspark.
>
> [Stage 1:===============>                                    (596 + 144) /
> 2042]
>
> IF we use larger number for --num-executors, the pyspark program exit with
> errors:
>
> ERROR YarnScheduler: Lost executor 113 on hag017.example.com: remote Rpc
> client disassociated
>
>
>
> I tried spark 1.4 and conf.set("dynamicAllocation.enabled", "true").
> However it does not help us to increase the number of active tasks.
>
> I expect larger number of active tasks with the cluster we have.
>
> Could anyone advise on this? Thank you very much!
>
>
>
> Shaun
>
>
>
>

Reply via email to