Dear All:

I am attempting to use kNN in a recommendation engine.  I am new to ML and new 
to Sci-Kit Learn.  I have found the community enormously helpful and was 
directed to submit my question to this mailing list instead of SO.  I searched 
questions asked in 2012 and 2013 for something similar to mine but didn't find 
anything; and I didn't see a rules section barring the ignorant from asking 
questions here, so I'll ask away.

As suggested in the subject line, I am attempting to perform a nearest 
neighbors analysis (unsupervised, I hope) on documents I collected from around 
the web.  I have read through Sci-Kit's kNN documentation, I am working from 
the Nearest Centroid Example, and based on this documentation's notes section I 
am confident that Sci-Kit's kNN can classify text.

What I'm struggling with is placing the numerical values I can and have 
retrieved (tf-idf vector; cosine & euclidean matrices) from my documents into a 
format that I can then successfully pass them to the kNN algorithm.  Two 
specific questions:

        1. Format of X. In the Nearest Centroid Example, X 
("X=iris.data[:,:2]") returns a 2D array that looks like this: [...[ 6.7  3.3] 
[ 6.7  3. ] [ 6.3  2.5] [ 6.5  3. ][ 6.2  3.4]   [ 5.9  3. ]].  Is it possible 
for me to transform my text-based values (likely either the similarity or 
distance measurement) into a comparable 2D array?  If so, how so?  If not, what 
would you recommend?  Can I still complete a nearest neighbors analysis on my 
documents?
        2. Value of y.  Just below X in the Nearest Centroid Example is y 
("y=iris.target").  It returns what appears to be a 1D array of nearest 
neighbor indices.  Is that the case?  If not, what is it?  And most 
importantly, irrespective of its value, how can I get a similar value out of my 
text documents?


I apologize retroactively if these questions are blatantly obvious in the 
documentation and thank everyone in advance for any contribution to my 
understanding.  Thank you.

Mike
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to