Better use coalesce instead of repatition On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> Use counts.repartition(1).save...... > Hth > > > On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote: > > Actually, when I run following code, > > val textFile = sc.textFile("Sample.txt") > val counts = textFile.flatMap(line => line.split(" ")) > .map(word => (word, 1)) > .reduceByKey(_ + _) > > > It save the results into more than one partition like part-00000, > part-00001. I want to collect all of them into one file. > > > 2017-10-20 16:43 GMT+03:00 Marco Mistroni <mmistr...@gmail.com>: > >> Hi >> Could you just create an rdd/df out of what you want to save and store >> it in hdfs? >> Hth >> >> On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote: >> >>> Hi all, >>> >>> In word count example, >>> >>> val textFile = sc.textFile("Sample.txt") >>> val counts = textFile.flatMap(line => line.split(" ")) >>> .map(word => (word, 1)) >>> .reduceByKey(_ + _) >>> counts.saveAsTextFile("hdfs://master:8020/user/abc") >>> >>> I want to write collection of "*counts" *which is used in code above to >>> HDFS, so >>> >>> val x = counts.collect() >>> >>> Actually I want to write *x *to HDFS. But spark wants to RDD to write >>> sometihng to HDFS >>> >>> How can I write Array[(String,Int)] to HDFS >>> >>> >>> -- >>> Uğur >>> >> > > > -- > Uğur Sopaoğlu > > > -- Thanks Deepak www.bigdatabig.com www.keosha.net