As Sean mentioned, you would be computing similar features then.
If you want to find similar users, I suggest running k-means with some
fixed number of clusters. It's not reasonable to try and compute all pairs
of similarities between 1bn items, so k-means with fixed k is more suitable
here.
Best
If you have tall x skinny matrix of m users and n products, column
similarity will give you a n x n matrix (product x product matrix)...this
is also called product correlation matrix...it can be cosine, pearson or
other kind of correlations...Note that if the entry is unobserved (user
Joanary did n
Well, you're computing similarity of your features then. Whether it is
meaningful depends a bit on the nature of your features and more on
the similarity algorithm.
On Wed, Dec 10, 2014 at 2:53 PM, Jaonary Rabarisoa wrote:
> Dear all,
>
> I'm trying to understand what is the correct use case of C
Dear all,
I'm trying to understand what is the correct use case of ColumnSimilarity
implemented in RowMatrix.
As far as I know, this function computes the similarity of a column of a
given matrix. The DIMSUM paper says that it's efficient for large m (rows)
and small n (columns). In this case the