Re: multiple passes in mapPartitions

2014-07-01 Thread Frank Austin Nothaft
Hi Zhen, The Scala iterator trait supports cloning via the duplicate method (http://www.scala-lang.org/api/current/index.html#scala.collection.Iterator@duplicate:(Iterator[A],Iterator[A])). Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Jun 13, 2

Re: multiple passes in mapPartitions

2014-07-01 Thread Chris Fregly
also, multiple calls to mapPartitions() will be pipelined by the spark execution engine into a single stage, so the overhead is minimal. On Fri, Jun 13, 2014 at 9:28 PM, zhen wrote: > Thank you for your suggestion. We will try it out and see how it performs. > We > think the single call to mapP

Re: multiple passes in mapPartitions

2014-06-13 Thread zhen
Thank you for your suggestion. We will try it out and see how it performs. We think the single call to mapPartitions will be faster but we could be wrong. It would be nice to have a "clone method" on the iterator. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.

Re: multiple passes in mapPartitions

2014-06-13 Thread Mayur Rustagi
Sorry if this is a dumb question but why not several calls to map-partitions sequentially. Are you looking to avoid function serialization or is your function damaging partitions? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi