> Doing a coalesce will be kind of a problem... I was hoping that would > be a utility or command option that could concat all the files > together for me...
If you do rdd.coalesce(1, shuffle = true), then rdd itself will still be processed in parallel (with each of its partitions' output getting written to disk), and only the final saveAsTextFile task will be non-parallel (it will sequentially pull in each upstream partition's output and write it to the single output file). In other words, coalesce(1, shuffle = true) for all intents and purposes is concat. Or is there a reason you would not find this sufficient? - Stephen
