Re: DIMSUM and ColumnSimilarity use case ?

Sean Owen Wed, 10 Dec 2014 07:40:20 -0800

Well, you're computing similarity of your features then. Whether it is
meaningful depends a bit on the nature of your features and more on
the similarity algorithm.


On Wed, Dec 10, 2014 at 2:53 PM, Jaonary Rabarisoa <jaon...@gmail.com> wrote:
> Dear all,
>
> I'm trying to understand what is the correct use case of ColumnSimilarity
> implemented in RowMatrix.
>
> As far as I know, this function computes the similarity of a column of a
> given matrix. The DIMSUM paper says that it's efficient for large m (rows)
> and small n (columns). In this case the output will be a n by n matrix.
>
> Now, suppose I want to compute similarity of several users, say m =
> billions. Each users is described by a high dimensional feature vector, say
> n = 10000. In my dataset, one row represent one user. So in that case
> computing the similarity my matrix is not the same as computing the
> similarity of all users. Then, what does it mean computing the similarity of
> the columns of my matrix in this case ?
>
> Best regards,
>
> Jao

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: DIMSUM and ColumnSimilarity use case ?

Reply via email to