Re: Save RDDs as CSV

Josh Rosen Wed, 30 Oct 2013 19:13:53 -0700

saveAsTextFile() is implemented in terms of Hadoop's TextOutputFormat,
which writes one record per line:
https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L816


You could map() each entry in your RDD into a comma-separated string, then
write those strings using saveAsTextFile().




On Wed, Oct 30, 2013 at 7:10 PM, Andre Schumacher <
[email protected]> wrote:

>
> Hi,
>
> Can you use saveAsTextFile? See
>
>
> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD
>
> I'm not sure what the default field separator is (Tab probably) but if
> you don't mind that may work? No need to collect it to the master.
>
> Andre
>
> On 10/30/2013 06:34 PM, Shay Seng wrote:
> > What's the recommended way to save a RDD as a CSV on say HDFS?
> > Do I have to collect the RDD and save it from the master, or is there
> > someway I can write out the CSV file in parallel to HDFS?
> >
> >
> > tks
> > shay
> >
>
>

Re: Save RDDs as CSV

Reply via email to