Hi, all Thanks for the reply
I actually need to provide a single file to an external system to process it…seems that I have to make the consumer of the file to support multiple inputs Best, -- Nan Zhu On Tuesday, January 7, 2014 at 12:37 PM, Aaron Davidson wrote: > HDFS, since 0.21 (https://issues.apache.org/jira/browse/HDFS-222), has a > concat() method which would do exactly this, but I am not sure of the > performance implications. Of course, as Matei pointed out, it's unusual to > actually need a single HDFS file. > > > On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <[email protected] > (mailto:[email protected])> wrote: > > Unfortunately this is expensive to do on HDFS — you’d need a single writer > > to write the whole file. If your file is small enough for that, you can use > > coalesce() on the RDD to bring all the data to one node, and then save it. > > However most HDFS applications work with directories containing multiple > > files instead of single files for this reason. > > > > Matei > > > > On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected] > > (mailto:[email protected])> wrote: > > > > > Hi, all > > > > > > maybe a stupid question, but is there any way to make Spark write a > > > single file instead of partitioned files? > > > > > > Best, > > > > > > -- > > > Nan Zhu > > > > > >
