Why are there different "parts" in my CSV?

Su She Thu, 12 Feb 2015 22:57:08 -0800

Hello Everyone,

I am writing simple word counts to hdfs using
messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class,
String.class, (Class) TextOutputFormat.class);


1) However, each 2 seconds I getting a new *directory *that is titled as a
csv. So i'll have test.csv, which will be a directory that has two files
inside of it called part-00000 and part 00001 (something like that). This
obv makes it very hard for me to read the data stored in the csv files. I
am wondering if there is a better way to store the JavaPairRecieverDStream
and JavaPairDStream?

2) I know there is a copy/merge hadoop api for merging files...can this be
done inside java? I am not sure the logic behind this api if I am using
spark streaming which is constantly making new files.

Thanks a lot for the help!

Why are there different "parts" in my CSV?

Reply via email to