|
Actually, even without the skewness
problem, the solution I've proposed is really not efficient, since
it generates a lot of data. Since what you have is very close to a
sparse matrix * sparse vector computation, in my opinion, you
should split your data in blocks and do the join on blocks.
Have a look at how the implementation of ALS does it, I think it should be efficient for your problem. Does RDD1 change ? If not, then you can prepare your blocks once, and iterate just like in ALS. Also, since you don't have to perform a computation on each row, you can split the sparse matrix both rowwise and columnwise. Guillaume Thank you for your reply. we have tried this method before, but step 2 is very time cosuming due to the value number of different keys is not well-distributed. Some key in lines of RDD1 is very dense, but others are very sparse. After join, the splits containing dense keys is very large and time consuming. We don't know how to solve this then. Do you have more efficient way? --
|
- How to efficiently join this two complicated rdds hanbo
- Re: How to efficiently join this two complicated rdds Eugen Cepoi
- Re: How to efficiently join this two complicated rdds Guillaume Pitel
- Re: How to efficiently join this two complicated ... Eugen Cepoi
- Re: How to efficiently join this two complicated ... hanbo
- Re: How to efficiently join this two complica... Guillaume Pitel

