Thank you for your reply.
   we have tried this method before, but step 2 is very time cosuming due to
the value number of different keys is not well-distributed. Some key in
lines of RDD1 is very dense, but others are very sparse. After join, the
splits containing dense keys is very large and time consuming. We don't know
how to solve this then. Do you have more efficient way?


   2 / join RDD1 and RDD2 => RDD1+2
    ("1",("L1",11))
    ("2",("L1",22))
    ("3",("L1",33))
    ("1",("L2",11))
    ("3",("L2",33))
    ("5",("L2",55))

    



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1728.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to