Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?
Hi Russell, Do you want to use RowMatrix.columnSimilarities to calculate cosine similarities? If so, you should using the following steps: val dataset: DataFrame // Convert the type of features column from ml.linalg.Vector type to mllib.linalg.Vector val oldDataset: DataFrame = MLUtils.convertVectorColumnsFromML(dataset, "features") // Convert fromt DataFrame to RDD val oldRDD: RDD[mllib.linalg.Vector] = oldDataset.select(col("features")).rdd.map { row => row.getAs[mllib.linalg.Vector](0) } // Generate RowMatrix val mat: RowMatrix = new RowMatrix(oldRDD, nRows, nCols) mat.columnSimilarities() Please feel free to let me know whether it can satisfy your requirements. Thanks Yanbo On Wed, Nov 16, 2016 at 9:26 AM, Russell Jurneywrote: > Asher, can you cast like that? Does that casting work? That is my > confusion: I don't know what a DataFrame Vector turns into in terms of an > RDD type. > > I'll try this, thanks. > > On Tue, Nov 15, 2016 at 11:25 AM, Asher Krim wrote: > >> What language are you using? For Java, you might convert the dataframe to >> an rdd using something like this: >> >> df >> .toJavaRDD() >> .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName"))); >> >> On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney > > wrote: >> >>> I have two dataframes with common feature vectors and I need to get the >>> cosine similarity of one against the other. It looks like this is possible >>> in the RDD based API, mllib, but not in ml. >>> >>> So, how do I convert my sparse dataframe vectors into something spark >>> mllib can use? I've searched, but haven't found anything. >>> >>> Thanks! >>> -- >>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io >>> >> >> >> >> -- >> Asher Krim >> Senior Software Engineer >> > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io >
Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?
Asher, can you cast like that? Does that casting work? That is my confusion: I don't know what a DataFrame Vector turns into in terms of an RDD type. I'll try this, thanks. On Tue, Nov 15, 2016 at 11:25 AM, Asher Krimwrote: > What language are you using? For Java, you might convert the dataframe to > an rdd using something like this: > > df > .toJavaRDD() > .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName"))); > > On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurney > wrote: > >> I have two dataframes with common feature vectors and I need to get the >> cosine similarity of one against the other. It looks like this is possible >> in the RDD based API, mllib, but not in ml. >> >> So, how do I convert my sparse dataframe vectors into something spark >> mllib can use? I've searched, but haven't found anything. >> >> Thanks! >> -- >> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io >> > > > > -- > Asher Krim > Senior Software Engineer > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
Re: Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?
What language are you using? For Java, you might convert the dataframe to an rdd using something like this: df .toJavaRDD() .map(row -> (SparseVector)row.getAs(row.fieldIndex("columnName"))); On Tue, Nov 15, 2016 at 1:06 PM, Russell Jurneywrote: > I have two dataframes with common feature vectors and I need to get the > cosine similarity of one against the other. It looks like this is possible > in the RDD based API, mllib, but not in ml. > > So, how do I convert my sparse dataframe vectors into something spark > mllib can use? I've searched, but haven't found anything. > > Thanks! > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io > -- Asher Krim Senior Software Engineer
Spark ML DataFrame API - need cosine similarity, how to convert to RDD Vectors?
I have two dataframes with common feature vectors and I need to get the cosine similarity of one against the other. It looks like this is possible in the RDD based API, mllib, but not in ml. So, how do I convert my sparse dataframe vectors into something spark mllib can use? I've searched, but haven't found anything. Thanks! -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io