Re: Saving Dstream into a single file
You can use the coalesce method to reduce the number of partitions. You can reduce to one if the data is not too big. Then write the output. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Mon, Mar 16, 2015 at 2:42 PM, Zhan Zhang zzh...@hortonworks.com wrote: Each RDD has multiple partitions, each of them will produce one hdfs file when saving output. I don’t think you are allowed to have multiple file handler writing to the same hdfs file. You still can load multiple files into hive tables, right? Thanks.. Zhan Zhang On Mar 15, 2015, at 7:31 AM, tarek_abouzeid tarek.abouzei...@yahoo.com wrote: i am doing word count example on flume stream and trying to save output as text files in HDFS , but in the save directory i got multiple sub directories each having files with small size , i wonder if there is a way to append in a large file instead of saving in multiple files , as i intend to save the output in hive hdfs directory so i can query the result using hive hope anyone have a workaround for this issue , Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Saving Dstream into a single file
Each RDD has multiple partitions, each of them will produce one hdfs file when saving output. I don’t think you are allowed to have multiple file handler writing to the same hdfs file. You still can load multiple files into hive tables, right? Thanks.. Zhan Zhang On Mar 15, 2015, at 7:31 AM, tarek_abouzeid tarek.abouzei...@yahoo.com wrote: i am doing word count example on flume stream and trying to save output as text files in HDFS , but in the save directory i got multiple sub directories each having files with small size , i wonder if there is a way to append in a large file instead of saving in multiple files , as i intend to save the output in hive hdfs directory so i can query the result using hive hope anyone have a workaround for this issue , Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Saving Dstream into a single file
i am doing word count example on flume stream and trying to save output as text files in HDFS , but in the save directory i got multiple sub directories each having files with small size , i wonder if there is a way to append in a large file instead of saving in multiple files , as i intend to save the output in hive hdfs directory so i can query the result using hive hope anyone have a workaround for this issue , Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org