Hi Ramkumar,
I don't think there's a good way to give them different names other than
opening and writing the files yourself. You could do that with a foreach(). For
example, suppose you created and RDD of records (say (key, listOfValues)) and
you wanted to save each one to a different file based on the key. You could do
records.foreach { rec =>
val out = // open FileOutputStream for rec. key
// write values to out
out.close()
}
You can access HDFS directly through the FileSystem class in Hadoop:
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.html.
Just use FileSystem.get(uri, configuration) to get the FileSystem object.
Otherwise, if that doesn't work, you can also rename the part-X files through
the same Filesystem API above.
Matei
On Oct 10, 2013, at 4:18 AM, Ramkumar Chokkalingam <[email protected]>
wrote:
> Hello,
>
> I'm writing reading multiple files, parsing them, and writing to an output
> file. As I see it, SaveAsTextFile takes the output path and emits the output
> under the directory we specify as file named part-00000, part-00001 etc
> depending on the number of clusters used ( similar to Hadoop).But is there a
> way, where you can make all your input files to be emitted in a single output
> folder ? Also, do we have control over the output file name (Different name
> rather than part-0000's) ?
>
>