Daniel, How many partitions do you have? Are they more or less uniformly distributed? We have similar data volume currently running well on Hadoop MapReduce with roughly 30 nodes. I was planning to test it with Spark. I'm very interested in your findings.
----- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html Sent from the Apache Spark User List mailing list archive at Nabble.com.