Hello,
I have a dataset containing TF-IDF vectors for a corpus of documents. How
do I perform a nearest neighbour search on the dataset, using cosine
similarity?
val df = spark.read.option("header", "false").csv("data")
val tk = new Tokenizer().setInputCol("_c2").setOutputCol("words")
val tf = new HashingTF().setInputCol("words").setOutputCol("tf")
val idf = new IDF().setInputCol("tf").setOutputCol("tf-idf")
val df1 = tf.transform(tk.transform(df))
idf.fit(df1).transform(df1).select("tf-idf").show(10)
Thank you
--
*Meeraj Kunnumpurath*
*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
*00 971 50 409 [email protected] <[email protected]>*