PS - I don't get this behaviour in all the cases. I did many runs of the
same job & i get this behaviour in around 40% of the cases.
Task 4 is the bottom row in the metrics table
Thank you,
Abhishek
e: abshkm...@gmail.com
p: 91-8233540996
On Tue, Feb 16, 2016 at 11:19 PM, Abhishek
> > > > > Sent from my Verizon Wireless 4G LTE smartphone
> > > > Original message
> > From: Abhishek Modi <abshkm...@gmail.com> > Date: 02/16/2016
> 4:12 AM (GMT-05:00) > To: user@spark.apache.org > Subject:
> Unusually large d
I'm doing a mapPartitions on a rdd cached in memory followed by a reduce.
Here is my code snippet
// myRdd is an rdd consisting of Tuple2[Int,Long]
myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2))
//The rangify function
def rangify(l: Iterator[ Tuple2[Int,Long] ]) :
I'm doing a mapPartitions on a rdd cached in memory followed by a reduce.
Here is my code snippet
// myRdd is an rdd consisting of Tuple2[Int,Long]
myRdd.mapPartitions(rangify).reduce( (x,y) => (x._1+y._1,x._2 ++ y._2))
//The rangify function
def rangify(l: Iterator[ Tuple2[Int,Long] ]) :
I mostly use Amazon S3 for reading input data and writing output data for my
spark jobs. I want to know the numbers of bytes read written by my job
from S3.
In hadoop, there are FileSystemCounters for this, is there something similar
in spark ? If there is, can you please guide me on how to use