Re: Configuring Spark for reduceByKey on on massive data sets

2015-10-11 Thread hotdog
hi Daniel, Do you solve your problem? I met the same problem when running massive data using reduceByKey on yarn. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p25023.html Sent from the

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-18 Thread Daniel Mahler
; roughly 30 nodes. > > I was planning to test it with Spark. > > I'm very interested in your findings. > > > > > > > > - > > Madhu > > https://www.linkedin.com/in/msiddalingaiah > > -- > > View this message in context: > http:/

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-18 Thread lukas nalezenec
> > Madhu > > https://www.linkedin.com/in/msiddalingaiah > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-17 Thread Matei Zaharia
x27;m very interested in your findings. > > > > - > Madhu > https://www.linkedin.com/in/msiddalingaiah > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-17 Thread Madhu
https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Configuring Spark for reduceByKey on on massive data sets

2014-05-17 Thread Daniel Mahler
I have had a lot of success with Spark on large datasets, both in terms of performance and flexibility. However I hit a wall with reduceByKey when the RDD contains billions of items. I am reducing with simple functions like addition for building histograms, so the reduction process should be consta