seems that coallesce do work, see following thread https://www.mail-archive.com/user%40spark.apache.org/msg00928.html
On 5 August 2015 at 09:47, Igor Berman <igor.ber...@gmail.com> wrote: > using coalesce might be dangerous, since 1 worker process will need to > handle whole file and if the file is huge you'll get OOM, however it > depends on implementation, I'm not sure how it will be done > nevertheless, worse to try the coallesce method(please post your results) > > another option would be to use FileUtil.copyMerge which copies each > partition one after another into destination stream(file); so as soon as > you've written your hdfs file with spark with multiple partitions in > parallel(as usual), you can then make another step to merge it into any > destination you want > > On 5 August 2015 at 07:43, Mohammed Guller <moham...@glassbeam.com> wrote: > >> Just to further clarify, you can first call coalesce with argument 1 and >> then call saveAsTextFile. For example, >> >> >> >> rdd.coalesce(1).saveAsTextFile(...) >> >> >> >> >> >> >> >> Mohammed >> >> >> >> *From:* Mohammed Guller >> *Sent:* Tuesday, August 4, 2015 9:39 PM >> *To:* 'Brandon White'; user >> *Subject:* RE: Combining Spark Files with saveAsTextFile >> >> >> >> One options is to use the coalesce method in the RDD class. >> >> >> >> Mohammed >> >> >> >> *From:* Brandon White [mailto:bwwintheho...@gmail.com >> <bwwintheho...@gmail.com>] >> *Sent:* Tuesday, August 4, 2015 7:23 PM >> *To:* user >> *Subject:* Combining Spark Files with saveAsTextFile >> >> >> >> What is the best way to make saveAsTextFile save as only a single file? >> > >