Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?

2016-11-19 Thread Yanbo Liang
Hi Russell,

Do you want to use RowMatrix.columnSimilarities to calculate cosine
similarities?
If so, you should using the following steps:

val dataset: DataFrame
// Convert the type of features column from ml.linalg.Vector type to
mllib.linalg.Vector
val oldDataset: DataFrame = MLUtils.convertVectorColumnsFromML(dataset,
"features")
// Convert fromt DataFrame to RDD
val oldRDD: RDD[mllib.linalg.Vector] =
oldDataset.select(col("features")).rdd.map { row =>
row.getAs[mllib.linalg.Vector](0) }
// Generate RowMatrix
val mat: RowMatrix = new RowMatrix(oldRDD, nRows, nCols)
mat.columnSimilarities()

Please feel free to let me know whether it can satisfy your requirements.


Thanks
Yanbo

On Wed, Nov 16, 2016 at 9:26 AM, Russell Jurney 
wrote:

> Asher, can you cast like that? Does that casting work? That is my
> confusion: I don't know what a DataFrame Vector turns into in terms of an
> RDD type.
>
> I'll try this, thanks.
>
> On Tue, Nov 15, 2016 at 11:25 AM, Asher Krim  wrote:
>
>> What language are you using? For Java, you might convert the dataframe to
>> an rdd using something like this:
>>
>> df
>> .toJavaRDD()
>> .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName")));
>>
>> On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney > > wrote:
>>
>>> I have two dataframes with common feature vectors and I need to get the
>>> cosine similarity of one against the other. It looks like this is possible
>>> in the RDD based API, mllib, but not in ml.
>>>
>>> So, how do I convert my sparse dataframe vectors into something spark
>>> mllib can use? I've searched, but haven't found anything.
>>>
>>> Thanks!
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>>>
>>
>>
>>
>> --
>> Asher Krim
>> Senior Software Engineer
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>


Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?

2016-11-16 Thread Russell Jurney
Asher, can you cast like that? Does that casting work? That is my
confusion: I don't know what a DataFrame Vector turns into in terms of an
RDD type.

I'll try this, thanks.

On Tue, Nov 15, 2016 at 11:25 AM, Asher Krim  wrote:

> What language are you using? For Java, you might convert the dataframe to
> an rdd using something like this:
>
> df
> .toJavaRDD()
> .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName")));
>
> On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney 
> wrote:
>
>> I have two dataframes with common feature vectors and I need to get the
>> cosine similarity of one against the other. It looks like this is possible
>> in the RDD based API, mllib, but not in ml.
>>
>> So, how do I convert my sparse dataframe vectors into something spark
>> mllib can use? I've searched, but haven't found anything.
>>
>> Thanks!
>> --
>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>>
>
>
>
> --
> Asher Krim
> Senior Software Engineer
>



-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io


Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?

2016-11-15 Thread Asher Krim
What language are you using? For Java, you might convert the dataframe to
an rdd using something like this:

df
.toJavaRDD()
.map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName")));

On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney 
wrote:

> I have two dataframes with common feature vectors and I need to get the
> cosine similarity of one against the other. It looks like this is possible
> in the RDD based API, mllib, but not in ml.
>
> So, how do I convert my sparse dataframe vectors into something spark
> mllib can use? I've searched, but haven't found anything.
>
> Thanks!
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>



-- 
Asher Krim
Senior Software Engineer


Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?

2016-11-15 Thread Russell Jurney
I have two dataframes with common feature vectors and I need to get the
cosine similarity of one against the other. It looks like this is possible
in the RDD based API, mllib, but not in ml.

So, how do I convert my sparse dataframe vectors into something spark mllib
can use? I've searched, but haven't found anything.

Thanks!
-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io