Hello Everyone, I am writing simple word counts to hdfs using messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, String.class, (Class) TextOutputFormat.class);
1) However, each 2 seconds I getting a new *directory *that is titled as a csv. So i'll have test.csv, which will be a directory that has two files inside of it called part-00000 and part 00001 (something like that). This obv makes it very hard for me to read the data stored in the csv files. I am wondering if there is a better way to store the JavaPairRecieverDStream and JavaPairDStream? 2) I know there is a copy/merge hadoop api for merging files...can this be done inside java? I am not sure the logic behind this api if I am using spark streaming which is constantly making new files. Thanks a lot for the help!