Reading multiple S3 objects, transforming, writing back one

Peter Wed, 30 Apr 2014 11:16:24 -0700

Hi

Playing around with Spark & S3, I'm opening multiple objects (CSV files) with:


    val hfile = sc.textFile("s3n://bucket/2014-04-28/")

so hfile is a RDD representing 10 objects that were "underneath" 2014-04-28. 
After I've sorted and otherwise transformed the content, I'm trying to write it 
back to a single object:

    
sortedMap.values.map(_.mkString(",")).saveAsTextFile("s3n://bucket/concatted.csv")

unfortunately this results in a "folder" named concatted.csv with 10 objects 
underneath, part-00000 .. part-00010, corresponding to the 10 original objects 
loaded. 

How can I achieve the desired behaviour of putting a single object named 
concatted.csv ?

I've tried 0.9.1 and 1.0.0-RC3. 

Thanks!
Peter

Reading multiple S3 objects, transforming, writing back one

Reply via email to