Re: Save RDDs as CSV

Stephen Haberman Wed, 30 Oct 2013 21:53:15 -0700

> Doing a coalesce will be kind of a problem... I was hoping that would
> be a utility or command option  that could concat all the files
> together for me...


If you do rdd.coalesce(1, shuffle = true), then rdd itself will still
be processed in parallel (with each of its partitions' output getting
written to disk), and only the final saveAsTextFile task will be
non-parallel (it will sequentially pull in each upstream partition's
output and write it to the single output file).

In other words, coalesce(1, shuffle = true) for all intents and
purposes is concat.

Or is there a reason you would not find this sufficient?

- Stephen

Re: Save RDDs as CSV

Reply via email to