Re: Row similarities

Suneel Marthi Sat, 17 Jan 2015 08:37:35 -0800

Andrew, u would be better off using Mahout's RowSimilarityJob for what u r 
trying to accomplish.

 1.  It does give u pair-wise distances 2.  U can specify the Distance measure 
u r looking to use 3.  There's the old MapReduce impl and the Spark DSL impl 
per ur preference.

      From: Andrew Musselman <andrew.mussel...@gmail.com>
 To: Reza Zadeh <r...@databricks.com> 
Cc: user <user@spark.apache.org> 
 Sent: Saturday, January 17, 2015 11:29 AM
 Subject: Re: Row similarities

Thanks Reza, interesting approach.  I think what I actually want is to 
calculate pair-wise distance, on second thought.  Is there a pattern for that?

On Jan 16, 2015, at 9:53 PM, Reza Zadeh <r...@databricks.com> wrote:

You can use K-means with a suitably large k. Each cluster should correspond to 
rows that are similar to one another.
On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman <andrew.mussel...@gmail.com> 
wrote:

What's a good way to calculate similarities between all vector-rows in a matrix 
or RDD[Vector]?

I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm going 
down a good path to transpose a matrix in order to run that.

Re: Row similarities

Reply via email to