Re: How to recommend most similar users using Spark ML
There are also some Spark packages for finding approximate nearest neighbors using locality sensitive hashing: https://spark-packages.org/?q=tags%3Alsh On Fri, Jul 15, 2016 at 7:45 AM nguyen duc Tuan <newvalu...@gmail.com> wrote: > Hi jeremycod, > If you want to find top N nearest neighbors for all users using exact > top-k algorithm for all users, I recommend using the same approach as as > used in Mllib : > https://github.com/apache/spark/blob/85d6b0db9f5bd425c36482ffcb1c3b9fd0fcdb31/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L272 > > If the number of users is large, the exact topk algorithm can rather slow, > try using approximate nearest neighbors algorithm. There's is a good > benchmark of various libraries that can be found here: > https://github.com/erikbern/ann-benchmarks > > 2016-07-15 10:36 GMT+07:00 jeremycod <zoran.jere...@gmail.com>: > >> Hi, >> >> I need to develop a service that will recommend user with other similar >> users that he can connect to. For each user I have a data about user >> preferences for specific items in the form: >> >> user, item, preference >> 1,75, 0.89 >> 2,168, 0.478 >> 2,99, 0.321 >> 3,31, 0.012 >> >> So far, I implemented approach using cosine similarity that compare one >> user >> features vector with other users: >> >> def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double= >> { >> vec1.dot(vec2)/(vec1.norm2()*vec2.norm2()) >> } >> def user2usersimilarity(userid:Integer, recNumber:Integer): Unit ={ >> val userFactor=model.userFeatures.lookup(userid).head >> val userVector=new DoubleMatrix(userFactor) >> val s1=cosineSimilarity(userVector,userVector) >> val sims=model.userFeatures.map{case(id,factor)=> >> val factorVector=new DoubleMatrix(factor) >> val sim=cosineSimilarity(factorVector, userVector) >> (id,sim) >> } >> val sortedSims=sims.top(recNumber+1)(Ordering.by[(Int, Double),Double] >> {case(id, similarity)=>similarity}) >> println(sortedSims.slice(1,recNumber+1).mkString("\n")) >> } >> >> This approach works fine with the MovieLens dataset in terms of quality of >> recommendations. However, my concern is related to performance of such >> algorithm. Since I have to generate recommendations for all users in the >> system, with this approach I would compare each user with all other users >> in >> the system. >> >> I would appreciate if somebody could suggest how to limit comparison of >> the >> user to top N neighbors, or some other algorithm that would work better in >> my use case. >> >> Thanks, >> Zoran >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >
Re: How to recommend most similar users using Spark ML
Hi jeremycod, If you want to find top N nearest neighbors for all users using exact top-k algorithm for all users, I recommend using the same approach as as used in Mllib : https://github.com/apache/spark/blob/85d6b0db9f5bd425c36482ffcb1c3b9fd0fcdb31/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L272 If the number of users is large, the exact topk algorithm can rather slow, try using approximate nearest neighbors algorithm. There's is a good benchmark of various libraries that can be found here: https://github.com/erikbern/ann-benchmarks 2016-07-15 10:36 GMT+07:00 jeremycod <zoran.jere...@gmail.com>: > Hi, > > I need to develop a service that will recommend user with other similar > users that he can connect to. For each user I have a data about user > preferences for specific items in the form: > > user, item, preference > 1,75, 0.89 > 2,168, 0.478 > 2,99, 0.321 > 3,31, 0.012 > > So far, I implemented approach using cosine similarity that compare one > user > features vector with other users: > > def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double= > { > vec1.dot(vec2)/(vec1.norm2()*vec2.norm2()) > } > def user2usersimilarity(userid:Integer, recNumber:Integer): Unit ={ > val userFactor=model.userFeatures.lookup(userid).head > val userVector=new DoubleMatrix(userFactor) > val s1=cosineSimilarity(userVector,userVector) > val sims=model.userFeatures.map{case(id,factor)=> > val factorVector=new DoubleMatrix(factor) > val sim=cosineSimilarity(factorVector, userVector) > (id,sim) > } > val sortedSims=sims.top(recNumber+1)(Ordering.by[(Int, Double),Double] > {case(id, similarity)=>similarity}) > println(sortedSims.slice(1,recNumber+1).mkString("\n")) > } > > This approach works fine with the MovieLens dataset in terms of quality of > recommendations. However, my concern is related to performance of such > algorithm. Since I have to generate recommendations for all users in the > system, with this approach I would compare each user with all other users > in > the system. > > I would appreciate if somebody could suggest how to limit comparison of the > user to top N neighbors, or some other algorithm that would work better in > my use case. > > Thanks, > Zoran > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
How to recommend most similar users using Spark ML
Hi, I need to develop a service that will recommend user with other similar users that he can connect to. For each user I have a data about user preferences for specific items in the form: user, item, preference 1,75, 0.89 2,168, 0.478 2,99, 0.321 3,31, 0.012 So far, I implemented approach using cosine similarity that compare one user features vector with other users: def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double= { vec1.dot(vec2)/(vec1.norm2()*vec2.norm2()) } def user2usersimilarity(userid:Integer, recNumber:Integer): Unit ={ val userFactor=model.userFeatures.lookup(userid).head val userVector=new DoubleMatrix(userFactor) val s1=cosineSimilarity(userVector,userVector) val sims=model.userFeatures.map{case(id,factor)=> val factorVector=new DoubleMatrix(factor) val sim=cosineSimilarity(factorVector, userVector) (id,sim) } val sortedSims=sims.top(recNumber+1)(Ordering.by[(Int, Double),Double] {case(id, similarity)=>similarity}) println(sortedSims.slice(1,recNumber+1).mkString("\n")) } This approach works fine with the MovieLens dataset in terms of quality of recommendations. However, my concern is related to performance of such algorithm. Since I have to generate recommendations for all users in the system, with this approach I would compare each user with all other users in the system. I would appreciate if somebody could suggest how to limit comparison of the user to top N neighbors, or some other algorithm that would work better in my use case. Thanks, Zoran -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org