Hello, Spark 1.1.0, Hadoop 2.4.1
I have written a Spark streaming application. And I am getting FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath). Here is brief what I am is trying to do. My application is creating text file stream using Java Stream context. The input file is on HDFS. JavaDStream<String> textStream = ssc.textFileStream(InputFile); Then it is comparing each line of input stream with some data and filtering it. The filtered data I am storing in JavaDStream<String>. JavaDStream<String> suspectedStream= textStream.flatMap(new FlatMapFunction<String,String>(){ @Override public Iterable<String> call(String line) throws Exception { List<String> filteredList = new ArrayList<String>(); // doing filter job return filteredList; } And this filteredList I am storing in HDFS as: suspectedStream.foreach(new Function<JavaRDD<String>,Void>(){ @Override public Void call(JavaRDD<String> rdd) throws Exception { rdd.saveAsTextFile(outputFolderPath); return null; }}); But with this I am receiving org.apache.hadoop.mapred.FileAlreadyExistsException. I tried with appending random number with outputFolderPath and its working. But my requirement is to collect all output in one directory. Can you please suggest if there is any way to get rid of this exception ? Thanks, Shailesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org