rdd.toLocalIterator will do almost what you want, but requires that each
individual partition fits in memory (rather than each individual line).
Hopefully that's sufficient, though.


On Fri, Aug 1, 2014 at 1:38 AM, Andrei <faithlessfri...@gmail.com> wrote:

> Is there a way to get iterator from RDD? Something like rdd.collect(), but
> returning lazy sequence and not single array.
>
> Context: I need to GZip processed data to upload it to Amazon S3. Since
> archive should be a single file, I want to iterate over RDD, writing each
> line to a local .gz file. File is small enough to fit local disk, but still
> large enough not to fit into memory.
>

Reply via email to