Re: save spark streaming output to single file on hdfs

2015-01-15 Thread Prannoy
Hi,

You can use FileUtil.copyMerge API and specify the path to the folder where
saveAsTextFile is save the part text file.

Suppose your directory is /a/b/c/

use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of
destination, Path to the merged file say (a/b/c.txt), true(to delete the
original dir,null))

Thanks.

On Tue, Jan 13, 2015 at 11:34 PM, jamborta [via Apache Spark User List] 
ml-node+s1001560n21124...@n3.nabble.com wrote:

 Hi all,

 Is there a way to save dstream RDDs to a single file so that another
 process can pick it up as a single RDD?
 It seems that each slice is saved to a separate folder, using
 saveAsTextFiles method.

 I'm using spark 1.2 with pyspark

 thanks,





 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=cHJhbm5veUBzaWdtb2lkYW5hbHl0aWNzLmNvbXwxfC0xNTI2NTg4NjQ2
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124p21167.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: save spark streaming output to single file on hdfs

2015-01-15 Thread jamborta
thanks for the replies. very useful.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124p21176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Tamas Jambor
Thanks. The problem is that we'd like it to be picked up by hive.

On Tue Jan 13 2015 at 18:15:15 Davies Liu dav...@databricks.com wrote:

 On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote:
  Hi all,
 
  Is there a way to save dstream RDDs to a single file so that another
 process
  can pick it up as a single RDD?

 It does not need to a single file, Spark can pick any directory as a
 single RDD.

 Also, it's easy to union multiple RDDs into single one.

  It seems that each slice is saved to a separate folder, using
  saveAsTextFiles method.
 
  I'm using spark 1.2 with pyspark
 
  thanks,
 
 
 
 
 
 
 
  --
  View this message in context: http://apache-spark-user-list.
 1001560.n3.nabble.com/save-spark-streaming-output-to-single-
 file-on-hdfs-tp21124.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Davies Liu
On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote:
 Hi all,

 Is there a way to save dstream RDDs to a single file so that another process
 can pick it up as a single RDD?

It does not need to a single file, Spark can pick any directory as a single RDD.

Also, it's easy to union multiple RDDs into single one.

 It seems that each slice is saved to a separate folder, using
 saveAsTextFiles method.

 I'm using spark 1.2 with pyspark

 thanks,







 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Davies Liu
Right now, you couldn't. You could load each file as a partition into
Hive, or you need to pack the files together by other tools or spark
job.

On Tue, Jan 13, 2015 at 10:35 AM, Tamas Jambor jambo...@gmail.com wrote:
 Thanks. The problem is that we'd like it to be picked up by hive.


 On Tue Jan 13 2015 at 18:15:15 Davies Liu dav...@databricks.com wrote:

 On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote:
  Hi all,
 
  Is there a way to save dstream RDDs to a single file so that another
  process
  can pick it up as a single RDD?

 It does not need to a single file, Spark can pick any directory as a
 single RDD.

 Also, it's easy to union multiple RDDs into single one.

  It seems that each slice is saved to a separate folder, using
  saveAsTextFiles method.
 
  I'm using spark 1.2 with pyspark
 
  thanks,
 
 
 
 
 
 
 
  --
  View this message in context:
  http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org