[Scikit-learn-general] LSA for documents similarity

Tasos Ventouris Sun, 29 Sep 2013 02:05:49 -0700

I am trying to create a script to compute the similarity for only two 
documents. I wrote this code but if I use two docs on the data set, the results 
is a 2x2 matrix with [[1,0],[0,1]]. If I use more than 2 documents, the results 
are almost correct. Any suggestion?


 def lsa(doc1,doc2):    dataset = [doc1,doc2]    vectorizer = 
TfidfVectorizer(stop_words='english')    X = vectorizer.fit_transform(dataset)  
  lsa = TruncatedSVD(n_components=100)    X = lsa.fit_transform(X)    X = 
Normalizer(copy=False).fit_transform(X)
    return cosine_similarity(X)

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] LSA for documents similarity

Reply via email to