CombineByKey - Please explain its working
I am reading about combinebyKey and going through below example from one of the blog post but i cant understand how it works step by step , Can some one please explain Case class Fruit ( kind : String , weight : Int ) { def makeJuice : Juice = Juice ( weight * 100 ) } Case class Juice ( volumn : Int ) { def add ( J : Juice ) : Juice = Juice ( volumn + J . volumn ) } Val apple1 = Fruit ( Apple , 5 ) Val Apple2 = Fruit ( Apple , 8 ) Val orange1 = Fruit ( orange , 10 ) Val Fruit = sc . Parallelize ( List (( Apple , apple1 ) , ( orange , orange1 ) , ( Apple , Apple2 ))) *Val Juice = Fruit . combineByKey ( f = f . makeJuice , ( J : Juice , f ) = J . add ( f . makeJuice ), ( J1 : Juice , J2 : Juice ) = J1 . add ( J2 ) )* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CombineByKey-Please-explain-its-working-tp22203.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How Does aggregate work
Hi , I am not able to understand how aggregate function works, Can some one please explain how below result came I am running spark using cloudera VM The result in below is 17 but i am not able to find out how it is calculating 17 val data = sc.parallelize(List(2,3,4)) data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y) *res21: Int = 17* Also when i try to change the 2nd parameter in sc.parallelize i get different result val data = sc.parallelize(List(2,3,4),2) data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y) *res21: Int = 13* Thanks for the help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
mapPartitions - How Does it Works
I am trying to understand about mapPartitions but i am still not sure how it works in the below example it create three partition val parallel = sc.parallelize(1 to 10, 3) and when we do below parallel.mapPartitions( x = List(x.next).iterator).collect it prints value Array[Int] = Array(1, 4, 7) Can some one please explain why it prints 1,4,7 only Thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org