Master and 0.8.1 (soon to be released) have `repartition`. It's
actually a new feature not an old one!

On Tue, Dec 17, 2013 at 4:31 PM, Mark Hamstra <[email protected]> wrote:
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L280
>
>
> On Tue, Dec 17, 2013 at 4:26 PM, Matei Zaharia <[email protected]>
> wrote:
>>
>> I’m not sure if a method called repartition() ever existed in an official
>> release, since we don’t remove methods, but there is a method called
>> coalesce() that does what you want. You just tell it the desired new number
>> of partitions. You can also have it shuffle the data across the cluster to
>> rebalance it. Take a look at
>> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD.
>>
>> Matei
>>
>> On Dec 17, 2013, at 3:53 PM, Mahdi Namazifar <[email protected]>
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I have a question regarding appending two RDDs using the union function,
>> > and I would appreciate if anyone could help me with it.
>> >
>> > I have two RDDs (let's call them RDD_1 and RDD_2) with the same number
>> > of partitions (let's say 10) and they are defined based on the rows of the
>> > same set of files that reside on HDFS.  In an iterative manner I add some 
>> > of
>> > the elements of RDD_2 to RDD_1 by
>> >
>> > RDD_1.union(RDD_2.filter(x => <some filter>))
>> >
>> > As a result of the above, at each iteration the number of partitions of
>> > RDD_1 is multiplied by 2 (20, 40, 80, 160, ...) and these new partitions 
>> > are
>> > generally very small in size.  In Spark 0.8.0 is there any way to avoid 
>> > this
>> > exponential increase in the number of partitions or how can I repartition 
>> > my
>> > RDD_1 to have a reasonable number of partitions after the iterations.  Also
>> > is there any other way of appending two RDDs that would not cause this
>> > issue?
>> >
>> > I noticed that in the older versions of Spark a repartition function
>> > existed that has been removed in the current version.
>> >
>> > Thanks,
>> > Mahdi
>>
>

Reply via email to