I have encountered the "all-pairs similarity" problem in my recommendation system. Thanks to this databricks blog, it seems RowMatrix may come to help.
However, RowMatrix is a matrix type without meaningful row indices, thereby I don't know how to retrieve the similarity result after invoking columnSimilarities(threshold) for specific item i and j Below is some details about what I am doing: 1) My data file comes from Movielens with format like this: user::item::rating 2) I build up a RowMatrix in which each sparse vector i represents the ratings of all users to this item i val dataPath = ... val ratings: RDD[Rating] = sc.textFile(dataPath).map(_.split("::") match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val rows = ratings.map(rating=>(rating.product, (rating.user, rating.rating))) .groupByKey() .map(p => Vectors.sparse(userAmount, p._2.map(r=>(r._1-1, r._2)).toSeq)) val mat = new RowMatrix(rows) val similarities = mat.columnSimilarities(0.5) Now I get a CoordinateMatrix similarities. How can I get the similarity of specific item i and j? Although it can be used to retrieve a RDD[MatrixEntry], I am not sure whether the row i and column j correspond to item i and j. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-retrieve-item-pair-after-calculating-similarity-using-RowMatrix-tp22654.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org