Unfortunately this is expensive to do on HDFS — you’d need a single writer to write the whole file. If your file is small enough for that, you can use coalesce() on the RDD to bring all the data to one node, and then save it. However most HDFS applications work with directories containing multiple files instead of single files for this reason.
Matei On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected]> wrote: > Hi, all > > maybe a stupid question, but is there any way to make Spark write a single > file instead of partitioned files? > > Best, > > -- > Nan Zhu >
