2 is added every time the final partition aggregator is called. The result of summing the elements across partitions is 9 of course. If you force a single partition (using spark-shell in local mode):
scala> val data = sc.parallelize(List(2,3,4),1) scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) res0: Int = 11 The 2nd function is still called, even though there is only one partition (presumably either x or y is set to 0). For every additional partition you specify as the 2nd arg. to parallelize, the 2nd function will be called again: scala> val data = sc.parallelize(List(2,3,4),1) scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) res0: Int = 11 scala> val data = sc.parallelize(List(2,3,4),2) scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) res0: Int = 13 scala> val data = sc.parallelize(List(2,3,4),3) scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) res0: Int = 15 scala> val data = sc.parallelize(List(2,3,4),4) scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) res0: Int = 17 Hence, it appears that not specifying the 2nd argument resulted in 4 partitions, even though you only had three elements in the list. If p_i is the ith partition, the final sum appears to be: (2 + ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...) Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Sun, Mar 22, 2015 at 8:05 AM, ashish.usoni <ashish.us...@gmail.com> wrote: > Hi , > I am not able to understand how aggregate function works, Can some one > please explain how below result came > I am running spark using cloudera VM > > The result in below is 17 but i am not able to find out how it is > calculating 17 > val data = sc.parallelize(List(2,3,4)) > data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) > *res21: Int = 17* > > Also when i try to change the 2nd parameter in sc.parallelize i get > different result > > val data = sc.parallelize(List(2,3,4),2) > data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y) > *res21: Int = 13* > > Thanks for the help. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >