Re: How to efficiently join this two complicated rdds

hanbo Tue, 18 Feb 2014 20:17:23 -0800

Thank you for your reply.
   we have tried this method before, but step 2 is very time cosuming due to
the value number of different keys is not well-distributed. Some key in
lines of RDD1 is very dense, but others are very sparse. After join, the
splits containing dense keys is very large and time consuming. We don't know
how to solve this then. Do you have more efficient way?



   2 / join RDD1 and RDD2 => RDD1+2
    ("1",("L1",11))
    ("2",("L1",22))
    ("3",("L1",33))
    ("1",("L2",11))
    ("3",("L2",33))
    ("5",("L2",55))

    



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1728.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to efficiently join this two complicated rdds

Reply via email to