hi Daniel,
Do you solve your problem?
I met the same problem when running massive data using reduceByKey on yarn.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p25023.html
Sent from the
; roughly 30 nodes.
> > I was planning to test it with Spark.
> > I'm very interested in your findings.
> >
> >
> >
> > -
> > Madhu
> > https://www.linkedin.com/in/msiddalingaiah
> > --
> > View this message in context:
> http:/
> > Madhu
> > https://www.linkedin.com/in/msiddalingaiah
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
x27;m very interested in your findings.
>
>
>
> -
> Madhu
> https://www.linkedin.com/in/msiddalingaiah
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I have had a lot of success with Spark on large datasets,
both in terms of performance and flexibility.
However I hit a wall with reduceByKey when the RDD contains billions of
items.
I am reducing with simple functions like addition for building histograms,
so the reduction process should be consta