Yes, you are totally right, I have mistaken the meaning of the method, and it works out perfectly as I construct it as the transpose. Really appreciate your help, thanks!
Date: Sat, 25 Apr 2015 20:57:04 -0700 Subject: Re: How can I retrieve item-pair after calculating similarity using RowMatrix From: jos...@databricks.com To: zhengweita...@outlook.com CC: user@spark.apache.org It looks like your code is making 1 Row per item, which means that columnSimilarities will compute similarities between users. If you transpose the matrix (or construct it as the transpose), then columnSimilarities should do what you want, and it will return meaningful indices.Joseph On Fri, Apr 24, 2015 at 11:20 PM, amghost <zhengweita...@outlook.com> wrote: I have encountered the "all-pairs similarity" problem in my recommendation system. Thanks to this databricks blog, it seems RowMatrix may come to help. However, RowMatrix is a matrix type without meaningful row indices, thereby I don't know how to retrieve the similarity result after invoking columnSimilarities(threshold) for specific item i and j Below is some details about what I am doing: 1) My data file comes from Movielens with format like this: user::item::rating 2) I build up a RowMatrix in which each sparse vector i represents the ratings of all users to this item i val dataPath = ... val ratings: RDD[Rating] = sc.textFile(dataPath).map(_.split("::") match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val rows = ratings.map(rating=>(rating.product, (rating.user, rating.rating))) .groupByKey() .map(p => Vectors.sparse(userAmount, p._2.map(r=>(r._1-1, r._2)).toSeq)) val mat = new RowMatrix(rows) val similarities = mat.columnSimilarities(0.5) Now I get a CoordinateMatrix similarities. How can I get the similarity of specific item i and j? Although it can be used to retrieve a RDD[MatrixEntry], I am not sure whether the row i and column j correspond to item i and j. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-retrieve-item-pair-after-calculating-similarity-using-RowMatrix-tp22654.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org 译 谷歌(国内)翻译结果 谷歌(国外)翻译结果 有道翻译结果 百度翻译结果 网页翻译 /'təʊtəlɪ/ adv. 完全地 完全 已开启 已关闭 via 有道翻译