Hi everyone, I have a JavaPairDStream<Integer, String> object and I'd like the Driver to create a txt file (on HDFS) containing all of its elements.
At the moment, I use the /coalesce(1, true)/ method: JavaPairDStream<Integer, String> unified = [partitioned stuff] unified.foreachRDD(new Function<JavaPairRDD<Integer, String>, Void>() { public Void call(JavaPairRDD<Integer, String> arg0) throws Exception { arg0.coalesce(1, true).saveAsTextFile(<HDFS path>); return null; } }); but this implies that a /single worker/ is taking all the data and writing to HDFS, and that could be a major bottleneck. How could I replace the worker with the Driver? I read that /collect()/ might do this, but I haven't the slightest idea on how to implement it. Can anybody help me? Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-collect-in-Spark-Streaming-tp24659.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org