saveAsTextFile() is implemented in terms of Hadoop's TextOutputFormat, which writes one record per line: https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L816
You could map() each entry in your RDD into a comma-separated string, then write those strings using saveAsTextFile(). On Wed, Oct 30, 2013 at 7:10 PM, Andre Schumacher < [email protected]> wrote: > > Hi, > > Can you use saveAsTextFile? See > > > http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD > > I'm not sure what the default field separator is (Tab probably) but if > you don't mind that may work? No need to collect it to the master. > > Andre > > On 10/30/2013 06:34 PM, Shay Seng wrote: > > What's the recommended way to save a RDD as a CSV on say HDFS? > > Do I have to collect the RDD and save it from the master, or is there > > someway I can write out the CSV file in parallel to HDFS? > > > > > > tks > > shay > > > >
