HDFS, since 0.21 <https://issues.apache.org/jira/browse/HDFS-222>, has a concat() method which would do exactly this, but I am not sure of the performance implications. Of course, as Matei pointed out, it's unusual to actually need a single HDFS file.
On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <[email protected]>wrote: > Unfortunately this is expensive to do on HDFS — you’d need a single writer > to write the whole file. If your file is small enough for that, you can use > coalesce() on the RDD to bring all the data to one node, and then save it. > However most HDFS applications work with directories containing multiple > files instead of single files for this reason. > > Matei > > On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected]> wrote: > > > Hi, all > > > > maybe a stupid question, but is there any way to make Spark write a > single file instead of partitioned files? > > > > Best, > > > > -- > > Nan Zhu > > > >
