Re: How to make Spark merge the output file?

Nan Zhu Tue, 07 Jan 2014 10:25:20 -0800

Hi, all   

Thanks for the reply


I actually need to provide a single file to an external system to process 
it…seems that I have to make the consumer of the file to support multiple inputs

Best,  

--  
Nan Zhu


On Tuesday, January 7, 2014 at 12:37 PM, Aaron Davidson wrote:

> HDFS, since 0.21 (https://issues.apache.org/jira/browse/HDFS-222), has a 
> concat() method which would do exactly this, but I am not sure of the 
> performance implications. Of course, as Matei pointed out, it's unusual to 
> actually need a single HDFS file.
>  
>  
> On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <[email protected] 
> (mailto:[email protected])> wrote:
> > Unfortunately this is expensive to do on HDFS — you’d need a single writer 
> > to write the whole file. If your file is small enough for that, you can use 
> > coalesce() on the RDD to bring all the data to one node, and then save it. 
> > However most HDFS applications work with directories containing multiple 
> > files instead of single files for this reason.
> >  
> > Matei
> >  
> > On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected] 
> > (mailto:[email protected])> wrote:
> >  
> > > Hi, all
> > >
> > > maybe a stupid question, but is there any way to make Spark write a 
> > > single file instead of partitioned files?
> > >
> > > Best,
> > >
> > > --
> > > Nan Zhu
> > >
> >  
>

Re: How to make Spark merge the output file?

Reply via email to