You can use K-means
<https://spark.apache.org/docs/latest/mllib-clustering.html> with a
suitably large k. Each cluster should correspond to rows that are similar
to one another.

On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> What's a good way to calculate similarities between all vector-rows in a
> matrix or RDD[Vector]?
>
> I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm
> going down a good path to transpose a matrix in order to run that.
>

Reply via email to