Which Spark release are you using ? Are you running in standalone mode ?
Cheers On Tue, Jun 30, 2015 at 10:03 AM, hotdog <lisend...@163.com> wrote: > I'm running reduceByKey in spark. My program is the simplest example of > spark: > > val counts = textFile.flatMap(line => line.split(" ")).repartition(20000). > .map(word => (word, 1)) > .reduceByKey(_ + _, 10000) > counts.saveAsTextFile("hdfs://...") > but it always run out of memory... > > I 'm using 50 servers , 35 executors per server, 140GB memory per server. > > the documents volume is : 8TB documents, 20 billion documents, 1000 billion > words in total. and the words after reduce will be about 100 million. > > I wonder how to set the configuration of spark? > > I wonder what value should these parameters be? > > 1. the number of the maps ? 20000 for example? > 2. the number of the reduces ? 10000 for example? > 3. others parameters? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/run-reduceByKey-on-huge-data-in-spark-tp23546.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >