I don't see the contact list of the potential connection. Overlap of connection lists should be an extremely strong signal.
You are correct that this tends to implemented be a classification problem. The target variable is a binary variable that indicates whether the person knows or does not know the potential connection. Predictor variables include what you have described as well as many variants of the same. On Tue, Jul 23, 2013 at 9:28 PM, Jason Lee <[email protected]> wrote: > Hi all, > > Currently i am working on recommendation system in a SNS site. There are > 15M+ registered members in our site. We already have a PYMK > implementation(not use mahout or any machine learning algorithms libs), but > the accuracy of recommend results produced by current implementation is not > as good as we expected, so i'm looking for a better way to implement this > feature. > > Here are some rules should be considered when recommend "People You May > Know" to current member: (any supplementaries?) > Contacts list imported by current member; > Same company: > overlap of employed date range between current member and recommended > members; > size of company; > function of current member and recommended members; > Same login IP > Same school > Mutual Friends > > > As far as i know, Mahout is focus on CF(Collaborative filtering), but PYMK > is more likely a content-based recommendation, because the informations > that hold in member's profile is base of PYMK processing. >
