statistical theory behind estimating the number of total tasks in GroupedSumEvaluator.scala

2016-10-02 Thread philipghu
Hi, I've been struggling to understand the statistical theory behind this piece of code (from /core/src/main/scala/org/apache/spark/partial/GroupedSumEvaluator.scala) below, especially with respect to estimating the size of the population (total tasks) and its variance. Also I'm trying to under

RDD for loop vs foreach

2016-07-12 Thread philipghu
Hi, I'm new to Spark and Scala as well. I understand that we can use foreach to apply a function to each element of an RDD, like rdd.foreach (x=>println(x)), but I saw we can also do a for loop to print each element of an RDD, like for (x <- rdd){ println(x) } Does defining the foreach function