Help with collect() in Spark Streaming

allonsy Fri, 11 Sep 2015 09:08:19 -0700

Hi everyone,

I have a JavaPairDStream<Integer, String> object and I'd like the Driver to
create a txt file (on HDFS) containing all of its elements.


At the moment, I use the /coalesce(1, true)/ method:


JavaPairDStream<Integer, String> unified = [partitioned stuff]
unified.foreachRDD(new Function<JavaPairRDD&lt;Integer, String>, Void>() {
                                public Void call(JavaPairRDD<Integer, String> 
arg0) throws Exception {
                                        arg0.coalesce(1, 
true).saveAsTextFile(<HDFS path>);
                                        return null;
                                }
});


but this implies that a /single worker/ is taking all the data and writing
to HDFS, and that could be a major bottleneck.

How could I replace the worker with the Driver? I read that /collect()/
might do this, but I haven't the slightest idea on how to implement it.

Can anybody help me? 

Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-collect-in-Spark-Streaming-tp24659.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Help with collect() in Spark Streaming

Reply via email to