Instead of doing this on the compute side, I would just write out the file with different blocks initially into HDFS and then use "hadoop fs -getmerge" or HDFSConcat to get one final output file.
- SF On Fri, Dec 12, 2014 at 11:19 AM, Steve Lewis <lordjoe2...@gmail.com> wrote: > > > I have an RDD which is potentially too large to store in memory with > collect. I want a single task to write the contents as a file to hdfs. Time > is not a large issue but memory is. > I say the following converting my RDD (scans) to a local Iterator. This > works but hasNext shows up as a separate task and takes on the order of 20 > sec for a medium sized job - > is *toLocalIterator a bad function to call in this case and is there a > better one?* > > > > > > > > > > > > *public void writeScores(final Appendable out, JavaRDD<IScoredScan> scans) { > writer.appendHeader(out, getApplication()); Iterator<IScoredScan> > scanIterator = scans.toLocalIterator(); while(scanIterator.hasNext()) { > IScoredScan scan = scanIterator.next(); writer.appendScan(out, > getApplication(), scan); } writer.appendFooter(out, getApplication());}* > > >