Re: How Does aggregate work

2015-03-23 Thread Paweł Szulc
It is actually number of cores. If your processor has hyperthreading then
it will be more (number of processors your OS sees)

niedz., 22 mar 2015, 4:51 PM Ted Yu użytkownik yuzhih...@gmail.com
napisał:

 I assume spark.default.parallelism is 4 in the VM Ashish was using.

 Cheers



Re: How Does aggregate work

2015-03-22 Thread Ted Yu
I assume spark.default.parallelism is 4 in the VM Ashish was using.

Cheers


How Does aggregate work

2015-03-22 Thread ashish.usoni
Hi , 
I am not able to understand how aggregate function works, Can some one
please explain how below result came 
I am running spark using cloudera VM 

The result in below is 17 but i am not able to find out how it is
calculating 17
val data = sc.parallelize(List(2,3,4))
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 17*

Also when i try to change the 2nd parameter in sc.parallelize i get
different result 

val data = sc.parallelize(List(2,3,4),2)
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 13*

Thanks for the help.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How Does aggregate work

2015-03-22 Thread Dean Wampler
2 is added every time the final partition aggregator is called. The result
of summing the elements across partitions is 9 of course. If you force a
single partition (using spark-shell in local mode):

scala val data = sc.parallelize(List(2,3,4),1)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 11

The 2nd function is still called, even though there is only one partition
(presumably either x or y is set to 0).

For every additional partition you specify as the 2nd arg. to parallelize,
the 2nd function will be called again:


scala val data = sc.parallelize(List(2,3,4),1)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 11

scala val data = sc.parallelize(List(2,3,4),2)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 13

scala val data = sc.parallelize(List(2,3,4),3)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 15

scala val data = sc.parallelize(List(2,3,4),4)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 17

Hence, it appears that not specifying the 2nd argument resulted in 4
partitions, even though you only had three elements in the list.

If p_i is the ith partition, the final sum appears to be:

(2 + ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...)


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Sun, Mar 22, 2015 at 8:05 AM, ashish.usoni ashish.us...@gmail.com
wrote:

 Hi ,
 I am not able to understand how aggregate function works, Can some one
 please explain how below result came
 I am running spark using cloudera VM

 The result in below is 17 but i am not able to find out how it is
 calculating 17
 val data = sc.parallelize(List(2,3,4))
 data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
 *res21: Int = 17*

 Also when i try to change the 2nd parameter in sc.parallelize i get
 different result

 val data = sc.parallelize(List(2,3,4),2)
 data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
 *res21: Int = 13*

 Thanks for the help.





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org