You can use K-means <https://spark.apache.org/docs/latest/mllib-clustering.html> with a suitably large k. Each cluster should correspond to rows that are similar to one another.
On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > What's a good way to calculate similarities between all vector-rows in a > matrix or RDD[Vector]? > > I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm > going down a good path to transpose a matrix in order to run that. >