Darren: this is not the last task of the stage. Thank you, Abhishek
e: abshkm...@gmail.com p: 91-8233540996 On Tue, Feb 16, 2016 at 6:52 PM, Darren Govoni <dar...@ontrenet.com> wrote: > There were some posts in this group about it. Another person also saw the > deadlock on next to last or last stage task. > > I've attached some images I collected showing this problem. > > > > <br><br><br>------- Original Message ------- > On 2/16/2016 07:29 AM Ted Yu wrote:<br>Darren: > <br>Can you post link to the deadlock issue you mentioned ? > <br> > <br>Thanks > <br> > <br>> On Feb 16, 2016, at 6:55 AM, Darren Govoni <dar...@ontrenet.com> > wrote: > <br>> <br>> I think this is part of the bigger issue of serious deadlock > conditions occurring in spark many of us have posted on. > <br>> <br>> Would the task in question be the past task of a stage by > chance? > <br>> <br>> <br>> <br>> Sent from my Verizon Wireless 4G LTE smartphone > <br>> <br>> <br>> -------- Original message -------- > <br>> From: Abhishek Modi <abshkm...@gmail.com> <br>> Date: 02/16/2016 > 4:12 AM (GMT-05:00) <br>> To: user@spark.apache.org <br>> Subject: > Unusually large deserialisation time <br>> <br>> I'm doing a mapPartitions > on a rdd cached in memory followed by a reduce. Here is my code snippet > <br>> <br>> // myRdd is an rdd consisting of Tuple2[Int,Long] <br>> > myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2)) > <br>> <br>> //The rangify function <br>> def rangify(l: Iterator[ > Tuple2[Int,Long] ]) : Iterator[ Tuple2[Long, List [ ArrayBuffer[ > Tuple2[Long,Long] ] ] ] ]= { <br>> var sum=0L <br>> val > mylist=ArrayBuffer[ Tuple2[Long,Long] ]() <br>> <br>> if(l.isEmpty) > <br>> return List( (0L,List [ ArrayBuffer[ Tuple2[Long,Long] ] ] > ())).toIterator <br>> <br>> var prev= -1000L <br>> var begin= -1000L > <br>> <br>> for (x <- l){ <br>> sum+=x._1 <br>> <br>> if(prev<0){ > <br>> prev=x._2 <br>> begin=x._2 <br>> } <br>> <br>> > else if(x._2==prev+1) <br>> prev=x._2 <br>> <br>> else { <br>> > list+=((begin,prev)) <br>> prev=x._2 <br>> begin=x._2 > <br>> } <br>> } <br>> <br>> mylist+= ((begin,prev)) <br>> <br>> > List((sum, List(mylist) ) ).toIterator <br>> } <br>> <br>> <br>> The rdd > is cached in memory. I'm using 20 executors with 1 core for each executor. > The cached rdd has 60 blocks. The problem is for every 2-3 runs of the job, > there is a task which has an abnormally large deserialisation time. > Screenshot attached <br>> <br>> Thank you, > <br>> Abhishek > <br>> <br>