Re: Re: Unusually large deserialisation time

Abhishek Modi Tue, 16 Feb 2016 09:50:14 -0800

Darren: this is not the last task of the stage.

Thank you,
Abhishek


e: abshkm...@gmail.com
p: 91-8233540996


On Tue, Feb 16, 2016 at 6:52 PM, Darren Govoni <dar...@ontrenet.com> wrote:

> There were some posts in this group about it. Another person also saw the
> deadlock on next to last or last stage task.
>
> I've attached some images I collected showing this problem.
>
>
>
> <br><br><br>------- Original Message -------
> On 2/16/2016  07:29 AM Ted Yu wrote:<br>Darren:
> <br>Can you post link to the deadlock issue you mentioned ?
> <br>
> <br>Thanks
> <br>
> <br>> On Feb 16, 2016, at 6:55 AM, Darren Govoni <dar...@ontrenet.com>
> wrote:
> <br>> <br>> I think this is part of the bigger issue of serious deadlock
> conditions occurring in spark many of us have posted on.
> <br>> <br>> Would the task in question be the past task of a stage by
> chance?
> <br>> <br>> <br>> <br>> Sent from my Verizon Wireless 4G LTE smartphone
> <br>> <br>> <br>> -------- Original message --------
> <br>> From: Abhishek Modi <abshkm...@gmail.com> <br>> Date: 02/16/2016
> 4:12 AM (GMT-05:00) <br>> To: user@spark.apache.org <br>> Subject:
> Unusually large deserialisation time <br>> <br>> I'm doing a mapPartitions
> on a rdd cached in memory followed by a reduce. Here is my code snippet
> <br>> <br>> // myRdd is an rdd consisting of Tuple2[Int,Long] <br>>
> myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2))
> <br>> <br>> //The rangify function <br>> def rangify(l: Iterator[
> Tuple2[Int,Long] ]) : Iterator[ Tuple2[Long, List [ ArrayBuffer[
> Tuple2[Long,Long] ] ] ] ]= { <br>>   var sum=0L <br>>   val
> mylist=ArrayBuffer[ Tuple2[Long,Long] ]() <br>> <br>>   if(l.isEmpty)
> <br>>     return List( (0L,List [ ArrayBuffer[ Tuple2[Long,Long] ] ]
> ())).toIterator <br>> <br>>   var prev= -1000L <br>>   var begin= -1000L
> <br>> <br>>   for (x <- l){ <br>>     sum+=x._1 <br>> <br>>     if(prev<0){
> <br>>       prev=x._2 <br>>       begin=x._2 <br>>     } <br>> <br>>
>  else if(x._2==prev+1) <br>>       prev=x._2 <br>> <br>>     else { <br>>
>      list+=((begin,prev)) <br>>       prev=x._2 <br>>       begin=x._2
> <br>>     } <br>>   } <br>> <br>>   mylist+= ((begin,prev)) <br>> <br>>
>  List((sum, List(mylist) ) ).toIterator <br>> } <br>> <br>> <br>> The rdd
> is cached in memory. I'm using 20 executors with 1 core for each executor.
> The cached rdd has 60 blocks. The problem is for every 2-3 runs of the job,
> there is a task which has an abnormally large deserialisation time.
> Screenshot attached <br>> <br>> Thank you,
> <br>> Abhishek
> <br>> <br>

Re: Re: Unusually large deserialisation time

Reply via email to