Thank you Jacob, It works for me. On Sat, Sep 10, 2016 at 12:54 AM, Jakob Odersky <ja...@odersky.com> wrote:
> > Hi Jakob, I have a DataFrame with like 10 patitions, based on the exact > content on each partition i want to batch load some other data from DB, i > cannot operate in parallel due to resource contraints i have, hence want > to sequential iterate over each partition and perform operations. > > > Ah I see. I think in that case your best option is to run several > jobs, selecting different subsets of your dataframe for each job and > running them one after the other. One way to do that would be to get > the underlying rdd, mapping with the partition's index and then > filtering and itering over every element. Eg.: > > val withPartitionIndex = df.rdd.mapPartitionWithIndex((idx, it) => > it.map(elem => (idx, elem)) > > for (i <- 0 until n) { > withPartitionIndex.filter{case (idx, _) => idx == i}.foreach{ case > (idx, elem) => > //do something with elem > } > } > > it's not the best use-case of Spark though and will probably be a > performance bottleneck. > > On Fri, Sep 9, 2016 at 11:45 AM, Jakob Odersky <ja...@odersky.com> wrote: > > Hi Sujeet, > > > > going sequentially over all parallel, distributed data seems like a > > counter-productive thing to do. What are you trying to accomplish? > > > > regards, > > --Jakob > > > > On Fri, Sep 9, 2016 at 3:29 AM, sujeet jog <sujeet....@gmail.com> wrote: > >> Hi, > >> Is there a way to iterate over a DataFrame with n partitions > sequentially, > >> > >> > >> Thanks, > >> Sujeet > >> >