My other question is that Spark why not provide foldLeft: *def foldLeft[U](zeroValue: U)(op: (U, T) => T): U *but aggregate. the *def fold(zeroValue: T)(op: (T, T) => T): T* in spark is not deterministic too.
On Thu, Oct 30, 2014 at 3:50 PM, Jason Zaugg <jza...@gmail.com> wrote: > On Thu, Oct 30, 2014 at 5:39 PM, Xuefeng Wu <ben...@gmail.com> wrote: > >> scala> import scala.collection.GenSeq >> scala> val seq = GenSeq("This", "is", "an", "example") >> >> scala> seq.aggregate("0")(_ + _, _ + _) >> res0: String = 0Thisisanexample >> >> scala> seq.par.aggregate("0")(_ + _, _ + _) >> res1: String = 0This0is0an0example >> > /** Aggregates the results of applying an operator to subsequent elements. > * > * This is a more general form of `fold` and `reduce`. It has similar > * semantics, but does not require the result to be a supertype of the > * element type. It traverses the elements in different partitions > * sequentially, using `seqop` to update the result, and then applies > * `combop` to results from different partitions. The implementation of > * this operation may operate on an arbitrary number of collection > * partitions, so `combop` may be invoked an arbitrary number of times. > > ... > * @tparam B the type of accumulated results > * @param z the initial value for the accumulated result of the > partition - this > * will typically be the neutral element for the `seqop` > operator (e.g. > * `Nil` for list concatenation or `0` for summation) and > may be evaluated > * more than once > * @param seqop an operator used to accumulate results within a > partition > * @param combop an associative operator used to combine results from > different partitions > */ > def aggregate[B](z: =>B)(seqop: (B, A) => B, combop: (B, B) => B): B > > The contract of aggregate allows for this, if you need deterministic > results you need to choose z that is a the nuetral element for combop. In > your example, this would be the empty string. > > -jason > > -- ~Yours, Xuefeng Wu/吴雪峰 敬上