Yeah that's the kind of thing I'm looking for; was looking at SPARK-4259 and 
poking around to see how to do things.

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4259

> On Jan 17, 2015, at 8:35 AM, Suneel Marthi <suneel_mar...@yahoo.com> wrote:
> 
> Andrew, u would be better off using Mahout's RowSimilarityJob for what u r 
> trying to accomplish.
> 
>  1.  It does give u pair-wise distances
>  2.  U can specify the Distance measure u r looking to use
>  3.  There's the old MapReduce impl and the Spark DSL impl per ur preference.
> 
> From: Andrew Musselman <andrew.mussel...@gmail.com>
> To: Reza Zadeh <r...@databricks.com> 
> Cc: user <user@spark.apache.org> 
> Sent: Saturday, January 17, 2015 11:29 AM
> Subject: Re: Row similarities
> 
> Thanks Reza, interesting approach.  I think what I actually want is to 
> calculate pair-wise distance, on second thought.  Is there a pattern for that?
> 
> 
> 
>> On Jan 16, 2015, at 9:53 PM, Reza Zadeh <r...@databricks.com> wrote:
>> 
>> You can use K-means with a suitably large k. Each cluster should correspond 
>> to rows that are similar to one another.
>> 
>> On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman 
>> <andrew.mussel...@gmail.com> wrote:
>> What's a good way to calculate similarities between all vector-rows in a 
>> matrix or RDD[Vector]?
>> 
>> I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm 
>> going down a good path to transpose a matrix in order to run that.
> 
> 

Reply via email to