what would good spill settings be? On Fri, Dec 12, 2014 at 2:45 PM, Sameer Farooqui <same...@databricks.com> wrote: > > You could try re-partitioning or coalescing the RDD to partition and then > write it to disk. Make sure you have good spill settings enabled so that > the RDD can spill to the local temp dirs if it has to. > > On Fri, Dec 12, 2014 at 2:39 PM, Steve Lewis <lordjoe2...@gmail.com> > wrote: >> >> The objective is to let the Spark application generate a file in a format >> which can be consumed by other programs - as I said I am willing to give up >> parallelism at this stage (all the expensive steps were earlier but do want >> an efficient way to pass once through an RDD without the requirement to >> hold it in memory as a list. >> >> On Fri, Dec 12, 2014 at 12:22 PM, Sameer Farooqui <same...@databricks.com >> > wrote: >> >>> Instead of doing this on the compute side, I would just write out the >>> file with different blocks initially into HDFS and then use "hadoop fs >>> -getmerge" or HDFSConcat to get one final output file. >>> >>> >>> - SF >>> >>> On Fri, Dec 12, 2014 at 11:19 AM, Steve Lewis <lordjoe2...@gmail.com> >>> wrote: >>>> >>>> >>>> I have an RDD which is potentially too large to store in memory with >>>> collect. I want a single task to write the contents as a file to hdfs. Time >>>> is not a large issue but memory is. >>>> I say the following converting my RDD (scans) to a local Iterator. This >>>> works but hasNext shows up as a separate task and takes on the order of 20 >>>> sec for a medium sized job - >>>> is *toLocalIterator a bad function to call in this case and is there a >>>> better one?* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *public void writeScores(final Appendable out, JavaRDD<IScoredScan> scans) >>>> { writer.appendHeader(out, getApplication()); Iterator<IScoredScan> >>>> scanIterator = scans.toLocalIterator(); while(scanIterator.hasNext()) >>>> { IScoredScan scan = scanIterator.next(); >>>> writer.appendScan(out, getApplication(), scan); } >>>> writer.appendFooter(out, getApplication());}* >>>> >>>> >>>> >>> >> >> >> -- >> Steven M. Lewis PhD >> 4221 105th Ave NE >> Kirkland, WA 98033 >> 206-384-1340 (cell) >> Skype lordjoe_com >> >>
-- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com