Hi Anastasios, Thanks for your information. I will look into the CoalescedRDD code.
Thanks, Fei On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias <zouz...@gmail.com> wrote: > Hi Fei, > > I looked at the code of CoalescedRDD and probably what I suggested will > not work. > > Speaking of which, CoalescedRDD is private[spark]. If this was not the > case, you could set balanceSlack to 1, and get what you requested, see > > https://github.com/apache/spark/blob/branch-1.6/core/ > src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala#L75 > > Maybe you could try to use the CoalescedRDD code to implement your > requirement. > > Good luck! > Cheers, > Anastasios > > > On Sun, Jan 15, 2017 at 5:39 PM, Fei Hu <hufe...@gmail.com> wrote: > >> Hi Anastasios, >> >> Thanks for your reply. If I just increase the numPartitions to be twice >> larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps >> the data locality? Do I need to define my own Partitioner? >> >> Thanks, >> Fei >> >> On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias <zouz...@gmail.com> >> wrote: >> >>> Hi Fei, >>> >>> How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ? >>> >>> https://github.com/apache/spark/blob/branch-1.6/core/src/mai >>> n/scala/org/apache/spark/rdd/RDD.scala#L395 >>> >>> coalesce is mostly used for reducing the number of partitions before >>> writing to HDFS, but it might still be a narrow dependency (satisfying your >>> requirements) if you increase the # of partitions. >>> >>> Best, >>> Anastasios >>> >>> On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufe...@gmail.com> wrote: >>> >>>> Dear all, >>>> >>>> I want to equally divide a RDD partition into two partitions. That >>>> means, the first half of elements in the partition will create a new >>>> partition, and the second half of elements in the partition will generate >>>> another new partition. But the two new partitions are required to be at the >>>> same node with their parent partition, which can help get high data >>>> locality. >>>> >>>> Is there anyone who knows how to implement it or any hints for it? >>>> >>>> Thanks in advance, >>>> Fei >>>> >>>> >>> >>> >>> -- >>> -- Anastasios Zouzias >>> <a...@zurich.ibm.com> >>> >> >> > > > -- > -- Anastasios Zouzias > <a...@zurich.ibm.com> >