How about using mapToPair and exchanging the two. Will it be efficient Below is the code , will it be efficient to convert like this.
JavaPairRDD<Long, MatcherReleventData> RddForMarch =matchRdd.zipWithindex.mapToPair(new PairFunction<Tuple2<VendorRecord,Long>, Long, MatcherReleventData>() { @Override public Tuple2<Long, MatcherReleventData> call(Tuple2<VendorRecord, Long> t) throws Exception { MatcherReleventData matcherData = new MatcherReleventData(); Tuple2<Long, MatcherReleventData> tuple = new Tuple2<Long, MatcherReleventData>(t._2, matcherData.convertVendorDataToMatcherData(t._1)); return tuple; } }).cache(); On 13 April 2015 at 03:11, Ted Yu <yuzhih...@gmail.com> wrote: > Please also take a look at ZippedWithIndexRDDPartition which is 72 lines > long. > > You can create your own version which extends RDD[(Long, T)] > > Cheers > > On Sun, Apr 12, 2015 at 1:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> bq. will return something like JavaPairRDD<Object, long> >> >> The long component of the pair fits your description of index. What other >> requirement does ZipWithIndex not provide you ? >> >> Cheers >> >> On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele <gangele...@gmail.com> >> wrote: >> >>> Hi All I have an RDD JavaRDD<Object> and I want to convert it to >>> JavaPairRDD<Index,Object>.. Index should be unique and it should maintain >>> the order. For first object It should have 1 and then for second 2 like >>> that. >>> >>> I tried using ZipWithIndex but it will return something like >>> JavaPairRDD<Object, long> >>> I wanted to use this RDD for lookup and join operation later in my >>> workflow so ordering is important. >>> >>> >>> Regards >>> jeet >>> >> >> >