Re: Re: Unusually large deserialisation time

2016-02-16 Thread Abhishek Modi
PS - I don't get this behaviour in all the cases. I did many runs of the same job & i get this behaviour in around 40% of the cases. Task 4 is the bottom row in the metrics table Thank you, Abhishek e: abshkm...@gmail.com p: 91-8233540996 On Tue, Feb 16, 2016 at 11:19 PM, Abhishek

Re: Re: Unusually large deserialisation time

2016-02-16 Thread Abhishek Modi
> > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > > Original message > > From: Abhishek Modi <abshkm...@gmail.com> > Date: 02/16/2016 > 4:12 AM (GMT-05:00) > To: user@spark.apache.org > Subject: > Unusually large d

Unusually large deserialisation time

2016-02-16 Thread Abhishek Modi
I'm doing a mapPartitions on a rdd cached in memory followed by a reduce. Here is my code snippet // myRdd is an rdd consisting of Tuple2[Int,Long] myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2)) //The rangify function def rangify(l: Iterator[ Tuple2[Int,Long] ]) :

Abnormally large deserialisation time for some tasks

2016-02-16 Thread Abhishek Modi
I'm doing a mapPartitions on a rdd cached in memory followed by a reduce. Here is my code snippet // myRdd is an rdd consisting of Tuple2[Int,Long] myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2)) //The rangify function def rangify(l: Iterator[ Tuple2[Int,Long] ]) :

Read/write metrics for jobs which use S3

2015-06-17 Thread Abhishek Modi
I mostly use Amazon S3 for reading input data and writing output data for my spark jobs. I want to know the numbers of bytes read written by my job from S3. In hadoop, there are FileSystemCounters for this, is there something similar in spark ? If there is, can you please guide me on how to use