Hi Andrew,

Thanks for your suggestion. I have tried the method. I used 8 nodes and
every node has 8G memory. The program just stopped at a stage for about
several hours without any further information. Maybe I need to find
out a more efficient way.


On Fri, Apr 11, 2014 at 5:24 PM, Andrew Ash <and...@andrewash.com> wrote:

> The naive way would be to put all the users and their attributes into an
> RDD, then cartesian product that with itself.  Run the similarity score on
> every pair (1M * 1M => 1T scores), map to (user, (score, otherUser)) and
> take the .top(k) for each user.
>
> I doubt that you'll be able to take this approach with the 1T pairs
> though, so it might be worth looking at the literature for recommender
> systems to see what else is out there.
>
>
> On Fri, Apr 11, 2014 at 9:54 PM, Xiaoli Li <lixiaolima...@gmail.com>wrote:
>
>> Hi all,
>>
>> I am implementing an algorithm using Spark. I have one million users. I
>> need to compute the similarity between each pair of users using some user's
>> attributes.  For each user, I need to get top k most similar users. What is
>> the best way to implement this?
>>
>>
>> Thanks.
>>
>
>

Reply via email to