What is the ++ operator here? Is this something you defined?

Another issue is that RDD's are not ordered, so when you union two
together it doesn't have a well defined ordering.

If you do want to do this you could coalesce into one partition, then
call MapPartitions and return an iterator that first adds your header
and then the rest of the file, then call saveAsTextFile. Keep in mind
this will only work if you coalesce into a single partition.

myRdd.coalesce(1)
.map(_.mkString(",")))
.mapPartitions(it => (Seq("col1,col2,col3") ++ it).iterator)
.saveAsTextFile("out.csv")

- Patrick

On Wed, Jan 22, 2014 at 11:12 AM, Aureliano Buendia
<[email protected]> wrote:
> Hi,
>
> I'm trying to find a way to create a csv header when using saveAsTextFile,
> and I came up with this:
>
> (sc.makeRDD(Array("col1,col2,col3"), 1) ++
> myRdd.coalesce(1).map(_.mkString(",")))
>       .saveAsTextFile("out.csv")
>
> But it only saves the header part. Why is that the union method does not
> return both RDD's?

Reply via email to