Fwd: how to split RDD by key and save to different path

2014-08-12 Thread Fengyun RAO
1. be careful, HDFS are better for large files, not bunches of small files. 2. if that's really what you want, roll it your own. def writeLines(iterator: Iterator[(String, String)]) = { val writers = new mutalbe.HashMap[String, BufferedWriter] // (key, writer) map try { while

Re: how to split RDD by key and save to different path

2014-08-12 Thread 诺铁
understand, thank you small file is a problem, I am considering process data before put them in hdfs. On Tue, Aug 12, 2014 at 9:37 PM, Fengyun RAO raofeng...@gmail.com wrote: 1. be careful, HDFS are better for large files, not bunches of small files. 2. if that's really what you want, roll

how to split RDD by key and save to different path

2014-08-11 Thread 诺铁
hi, I have googled and find similar question without good answer, http://stackoverflow.com/questions/24520225/writing-to-hadoop-distributed-file-system-multiple-times-with-spark in short, I would like to separate raw data and divide by some key, for example, create date, and put the in directory