If you return an iterable, you are not tying the API to a compactbuffer. Someday, the data could be fetched lazily and he API would not have to change. On Apr 23, 2015 6:59 PM, "Dean Wampler" <deanwamp...@gmail.com> wrote:
> I wasn't involved in this decision ("I just make the fries"), but > CompactBuffer is designed for relatively small data sets that at least fit > in memory. It's more or less an Array. In principle, returning an iterator > could hide the actual data structure that might be needed to hold a much > bigger data set, if necessary. > > HOWEVER, it actually returns a CompactBuffer. > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L444 > > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Thu, Apr 23, 2015 at 5:46 PM, Hao Ren <inv...@gmail.com> wrote: > >> Should I repost this to dev list ? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/why-does-groupByKey-return-RDD-K-Iterable-V-not-RDD-K-CompactBuffer-V-tp22616p22640.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >