Hi Ramkumar,

I don't think there's a good way to give them different names other than 
opening and writing the files yourself. You could do that with a foreach(). For 
example, suppose you created and RDD of records (say (key, listOfValues)) and 
you wanted to save each one to a different file based on the key. You could do

records.foreach { rec =>
  val out = // open FileOutputStream for rec. key
  // write values to out
  out.close()
}

You can access HDFS directly through the FileSystem class in Hadoop: 
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/fs/FileSystem.html. 
Just use FileSystem.get(uri, configuration) to get the FileSystem object.

Otherwise, if that doesn't work, you can also rename the part-X files through 
the same Filesystem API above.

Matei

On Oct 10, 2013, at 4:18 AM, Ramkumar Chokkalingam <[email protected]> 
wrote:

> Hello, 
> 
> I'm writing reading multiple files, parsing them, and writing to an output 
> file. As I see it, SaveAsTextFile takes the output path and emits the output 
> under the directory we specify as file named part-00000, part-00001 etc 
> depending on the number of clusters used ( similar to Hadoop).But is there a 
> way, where you can make all your input files to be emitted in a single output 
> folder ? Also, do we have control over the output file name (Different name 
> rather than part-0000's) ?
> 
> 

Reply via email to