Master and 0.8.1 (soon to be released) have `repartition`. It's actually a new feature not an old one!
On Tue, Dec 17, 2013 at 4:31 PM, Mark Hamstra <[email protected]> wrote: > https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L280 > > > On Tue, Dec 17, 2013 at 4:26 PM, Matei Zaharia <[email protected]> > wrote: >> >> I’m not sure if a method called repartition() ever existed in an official >> release, since we don’t remove methods, but there is a method called >> coalesce() that does what you want. You just tell it the desired new number >> of partitions. You can also have it shuffle the data across the cluster to >> rebalance it. Take a look at >> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD. >> >> Matei >> >> On Dec 17, 2013, at 3:53 PM, Mahdi Namazifar <[email protected]> >> wrote: >> >> > Hi everyone, >> > >> > I have a question regarding appending two RDDs using the union function, >> > and I would appreciate if anyone could help me with it. >> > >> > I have two RDDs (let's call them RDD_1 and RDD_2) with the same number >> > of partitions (let's say 10) and they are defined based on the rows of the >> > same set of files that reside on HDFS. In an iterative manner I add some >> > of >> > the elements of RDD_2 to RDD_1 by >> > >> > RDD_1.union(RDD_2.filter(x => <some filter>)) >> > >> > As a result of the above, at each iteration the number of partitions of >> > RDD_1 is multiplied by 2 (20, 40, 80, 160, ...) and these new partitions >> > are >> > generally very small in size. In Spark 0.8.0 is there any way to avoid >> > this >> > exponential increase in the number of partitions or how can I repartition >> > my >> > RDD_1 to have a reasonable number of partitions after the iterations. Also >> > is there any other way of appending two RDDs that would not cause this >> > issue? >> > >> > I noticed that in the older versions of Spark a repartition function >> > existed that has been removed in the current version. >> > >> > Thanks, >> > Mahdi >> >
