I am trying to create a script to compute the similarity for only two
documents. I wrote this code but if I use two docs on the data set, the results
is a 2x2 matrix with [[1,0],[0,1]]. If I use more than 2 documents, the results
are almost correct. Any suggestion?
def lsa(doc1,doc2): dataset = [doc1,doc2] vectorizer =
TfidfVectorizer(stop_words='english') X = vectorizer.fit_transform(dataset)
lsa = TruncatedSVD(n_components=100) X = lsa.fit_transform(X) X =
Normalizer(copy=False).fit_transform(X)
return cosine_similarity(X) ------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general