we are trying to achieve an iterative algorithm. During each iteration, we need to solve the problem raised in the first post. And the size of RDD two may changes from 4500000 to 40000.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1675.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
