Re: Saving Dstream into a single file

2015-03-23 Thread Dean Wampler
You can use the coalesce method to reduce the number of partitions. You can
reduce to one if the data is not too big. Then write the output.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Mon, Mar 16, 2015 at 2:42 PM, Zhan Zhang zzh...@hortonworks.com wrote:

 Each RDD has multiple partitions, each of them will produce one hdfs file
 when saving output. I don’t think you are allowed to have multiple file
 handler writing to the same hdfs file.  You still can load multiple files
 into hive tables, right?

 Thanks..

 Zhan Zhang

 On Mar 15, 2015, at 7:31 AM, tarek_abouzeid tarek.abouzei...@yahoo.com
 wrote:

  i am doing word count example on flume stream and trying to save output
 as
  text files in HDFS , but in the save directory i got multiple sub
  directories each having files with small size , i wonder if there is a
 way
  to append in a large file instead of saving in multiple files , as i
 intend
  to save the output in hive hdfs directory so i can query the result using
  hive
 
  hope anyone have a workaround for this issue , Thanks in advance
 
 
 
  --
  View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Saving Dstream into a single file

2015-03-16 Thread Zhan Zhang
Each RDD has multiple partitions, each of them will produce one hdfs file when 
saving output. I don’t think you are allowed to have multiple file handler 
writing to the same hdfs file.  You still can load multiple files into hive 
tables, right?

Thanks..

Zhan Zhang

On Mar 15, 2015, at 7:31 AM, tarek_abouzeid tarek.abouzei...@yahoo.com wrote:

 i am doing word count example on flume stream and trying to save output as
 text files in HDFS , but in the save directory i got multiple sub
 directories each having files with small size , i wonder if there is a way
 to append in a large file instead of saving in multiple files , as i intend
 to save the output in hive hdfs directory so i can query the result using
 hive 
 
 hope anyone have a workaround for this issue , Thanks in advance 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Saving Dstream into a single file

2015-03-15 Thread tarek_abouzeid
i am doing word count example on flume stream and trying to save output as
text files in HDFS , but in the save directory i got multiple sub
directories each having files with small size , i wonder if there is a way
to append in a large file instead of saving in multiple files , as i intend
to save the output in hive hdfs directory so i can query the result using
hive 

hope anyone have a workaround for this issue , Thanks in advance 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org