HDFS, since 0.21 <https://issues.apache.org/jira/browse/HDFS-222>, has a
concat() method which would do exactly this, but I am not sure of the
performance implications. Of course, as Matei pointed out, it's unusual to
actually need a single HDFS file.


On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <[email protected]>wrote:

> Unfortunately this is expensive to do on HDFS — you’d need a single writer
> to write the whole file. If your file is small enough for that, you can use
> coalesce() on the RDD to bring all the data to one node, and then save it.
> However most HDFS applications work with directories containing multiple
> files instead of single files for this reason.
>
> Matei
>
> On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected]> wrote:
>
> > Hi, all
> >
> > maybe a stupid question, but is there any way to make Spark write a
> single file instead of partitioned files?
> >
> > Best,
> >
> > --
> > Nan Zhu
> >
>
>

Reply via email to