Well, you're computing similarity of your features then. Whether it is meaningful depends a bit on the nature of your features and more on the similarity algorithm.
On Wed, Dec 10, 2014 at 2:53 PM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Dear all, > > I'm trying to understand what is the correct use case of ColumnSimilarity > implemented in RowMatrix. > > As far as I know, this function computes the similarity of a column of a > given matrix. The DIMSUM paper says that it's efficient for large m (rows) > and small n (columns). In this case the output will be a n by n matrix. > > Now, suppose I want to compute similarity of several users, say m = > billions. Each users is described by a high dimensional feature vector, say > n = 10000. In my dataset, one row represent one user. So in that case > computing the similarity my matrix is not the same as computing the > similarity of all users. Then, what does it mean computing the similarity of > the columns of my matrix in this case ? > > Best regards, > > Jao --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org