Yes, you are totally right, I have mistaken the meaning of the method, and it 
works out perfectly as I construct it as the transpose. Really appreciate your 
help, thanks!


Date: Sat, 25 Apr 2015 20:57:04 -0700
Subject: Re: How can I retrieve item-pair after calculating similarity using 
RowMatrix
From: jos...@databricks.com
To: zhengweita...@outlook.com
CC: user@spark.apache.org

It looks like your code is making 1 Row per item, which means that 
columnSimilarities will compute similarities between users.  If you transpose 
the matrix (or construct it as the transpose), then columnSimilarities should 
do what you want, and it will return meaningful indices.Joseph
On Fri, Apr 24, 2015 at 11:20 PM, amghost <zhengweita...@outlook.com> wrote:
I have encountered the "all-pairs similarity" problem in my recommendation

system. Thanks to this databricks blog, it seems RowMatrix may come to help.



However, RowMatrix is a matrix type without meaningful row indices, thereby

I don't know how to retrieve the similarity result after invoking

columnSimilarities(threshold) for specific item i and j



Below is some details about what I am doing:



1) My data file comes from Movielens with format like this:



user::item::rating

2) I build up a RowMatrix in which each sparse vector i represents the

ratings of all users to this item i



val dataPath = ...

val ratings: RDD[Rating] = sc.textFile(dataPath).map(_.split("::") match {

  case Array(user, item, rate) => Rating(user.toInt, item.toInt,

rate.toDouble)

})

val rows = ratings.map(rating=>(rating.product, (rating.user,

rating.rating)))

  .groupByKey()

  .map(p => Vectors.sparse(userAmount, p._2.map(r=>(r._1-1, r._2)).toSeq))



val mat = new RowMatrix(rows)



val similarities = mat.columnSimilarities(0.5)

Now I get a CoordinateMatrix similarities. How can I get the similarity of

specific item i and j? Although it can be used to retrieve a

RDD[MatrixEntry], I am not sure whether the row i and column j correspond to

item i and j.







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-retrieve-item-pair-after-calculating-similarity-using-RowMatrix-tp22654.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org




译      谷歌(国内)翻译结果   谷歌(国外)翻译结果   有道翻译结果   百度翻译结果  网页翻译       /'təʊtəlɪ/      
adv. 完全地     完全        已开启 已关闭   via 有道翻译                                       
     

Reply via email to