Actually - looking at your use case, you may simply be saving the original RDD Doing something like: val newRdd = MyRdd.flatMap(func) newRdd.saveAsTextFile(...)
May solve your issue. On Thu, Jan 30, 2014 at 10:17 AM, Evan R. Sparks <[email protected]>wrote: > Could it be that you have the same records that you get back from flatMap, > just in a different order? > > > On Thu, Jan 30, 2014 at 1:05 AM, Archit Thakur > <[email protected]>wrote: > >> Needless to say, it works fine with int/string(primitive) type. >> >> >> On Wed, Jan 29, 2014 at 2:04 PM, Archit Thakur <[email protected] >> > wrote: >> >>> Hi, >>> >>> I am facing a general problem with flatmap operation on rdd. >>> >>> I am doing >>> >>> MyRdd.flatmap(func(_)) >>> MyRdd.saveAsTextFile(..) >>> >>> func(Tuple2[Key, Value]): List[Tuple2[MyCustomKey, MyCustomValue]] = { >>> >>> // >>> >>> println(list) >>> list >>> } >>> >>> now if I check the list from the logs at worker and check the textfile >>> it has created, it differs. >>> >>> Only the no. of records are same, but the actual records in the file >>> differs from one in the logs. >>> >>> Does Spark modifies keys/values in between? What other operations does >>> it perform with Key or Value? >>> >>> Thanks and Regards, >>> Archit Thakur. >>> >>> >> >
