Hello,
Spark 1.1.0, Hadoop 2.4.1
I have written a Spark streaming application. And I am getting
FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath).
Here is brief what I am is trying to do.
My application is creating text file stream using Java Stream context. The
input file is on HDFS.
JavaDStream<String> textStream = ssc.textFileStream(InputFile);
Then it is comparing each line of input stream with some data and filtering
it. The filtered data I am storing in JavaDStream<String>.
JavaDStream<String> suspectedStream= textStream.flatMap(new
FlatMapFunction<String,String>(){
@Override
public Iterable<String> call(String line) throws
Exception {
List<String> filteredList = new ArrayList<String>();
// doing filter job
return filteredList;
}
And this filteredList I am storing in HDFS as:
suspectedStream.foreach(new
Function<JavaRDD<String>,Void>(){
@Override
public Void call(JavaRDD<String> rdd) throws
Exception {
rdd.saveAsTextFile(outputFolderPath);
return null;
}});
But with this I am receiving
org.apache.hadoop.mapred.FileAlreadyExistsException.
I tried with appending random number with outputFolderPath and its working.
But my requirement is to collect all output in one directory.
Can you please suggest if there is any way to get rid of this exception ?
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]