Re: how to convert an rdd to a single output file

Sameer Farooqui Fri, 12 Dec 2014 12:24:19 -0800

Instead of doing this on the compute side, I would just write out the file
with different blocks initially into HDFS and then use "hadoop fs
-getmerge" or HDFSConcat to get one final output file.



- SF

On Fri, Dec 12, 2014 at 11:19 AM, Steve Lewis <lordjoe2...@gmail.com> wrote:
>
>
> I have an RDD which is potentially too large to store in memory with
> collect. I want a single task to write the contents as a file to hdfs. Time
> is not a large issue but memory is.
> I say the following converting my RDD (scans) to a local Iterator. This
> works but hasNext shows up as a separate task and takes on the order of 20
> sec for a medium sized job -
> is *toLocalIterator a bad function to call in this case and is there a
> better one?*
>
>
>
>
>
>
>
>
>
>
>
> *public void writeScores(final Appendable out, JavaRDD<IScoredScan> scans) {  
>   writer.appendHeader(out, getApplication());    Iterator<IScoredScan> 
> scanIterator = scans.toLocalIterator();    while(scanIterator.hasNext())  {   
>      IScoredScan scan = scanIterator.next();        writer.appendScan(out, 
> getApplication(), scan);    }    writer.appendFooter(out, getApplication());}*
>
>
>

Re: how to convert an rdd to a single output file

Reply via email to