I have encountered the "all-pairs similarity" problem in my recommendation
system. Thanks to this databricks blog, it seems RowMatrix may come to help.

However, RowMatrix is a matrix type without meaningful row indices, thereby
I don't know how to retrieve the similarity result after invoking
columnSimilarities(threshold) for specific item i and j

Below is some details about what I am doing:

1) My data file comes from Movielens with format like this:

user::item::rating
2) I build up a RowMatrix in which each sparse vector i represents the
ratings of all users to this item i

val dataPath = ...
val ratings: RDD[Rating] = sc.textFile(dataPath).map(_.split("::") match { 
  case Array(user, item, rate) => Rating(user.toInt, item.toInt,
rate.toDouble)
})
val rows = ratings.map(rating=>(rating.product, (rating.user,
rating.rating)))
  .groupByKey()
  .map(p => Vectors.sparse(userAmount, p._2.map(r=>(r._1-1, r._2)).toSeq))

val mat = new RowMatrix(rows)

val similarities = mat.columnSimilarities(0.5)
Now I get a CoordinateMatrix similarities. How can I get the similarity of
specific item i and j? Although it can be used to retrieve a
RDD[MatrixEntry], I am not sure whether the row i and column j correspond to
item i and j.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-retrieve-item-pair-after-calculating-similarity-using-RowMatrix-tp22654.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to