In order to do that, first of all you need to Key RDD by Key. and then use
saveAsHadoopFile in this way:
We can use saveAsHadoopFile(location,classOf[KeyClass],
classOf[ValueClass], classOf[PartitionOutputFormat])
When PartitionOutputFormat is extended from MultipleTextOutputFormat.
Sample for t
I need to spilt RDD [keys, Iterable[Value]] to save each key into
different file.
e.g I have records like: customerId, name, age, sex
111,abc,34,M
122, xyz,32,F
111,def,31,F
122.trp,30,F
133,jkl,35,M
I need to write 3 different files based on customerId
file1:
111,abc,34,M
111,def,31,F
file2: