Dear All:
I am attempting to use kNN in a recommendation engine. I am new to ML and new
to Sci-Kit Learn. I have found the community enormously helpful and was
directed to submit my question to this mailing list instead of SO. I searched
questions asked in 2012 and 2013 for something similar to mine but didn't find
anything; and I didn't see a rules section barring the ignorant from asking
questions here, so I'll ask away.
As suggested in the subject line, I am attempting to perform a nearest
neighbors analysis (unsupervised, I hope) on documents I collected from around
the web. I have read through Sci-Kit's kNN documentation, I am working from
the Nearest Centroid Example, and based on this documentation's notes section I
am confident that Sci-Kit's kNN can classify text.
What I'm struggling with is placing the numerical values I can and have
retrieved (tf-idf vector; cosine & euclidean matrices) from my documents into a
format that I can then successfully pass them to the kNN algorithm. Two
specific questions:
1. Format of X. In the Nearest Centroid Example, X
("X=iris.data[:,:2]") returns a 2D array that looks like this: [...[ 6.7 3.3]
[ 6.7 3. ] [ 6.3 2.5] [ 6.5 3. ][ 6.2 3.4] [ 5.9 3. ]]. Is it possible
for me to transform my text-based values (likely either the similarity or
distance measurement) into a comparable 2D array? If so, how so? If not, what
would you recommend? Can I still complete a nearest neighbors analysis on my
documents?
2. Value of y. Just below X in the Nearest Centroid Example is y
("y=iris.target"). It returns what appears to be a 1D array of nearest
neighbor indices. Is that the case? If not, what is it? And most
importantly, irrespective of its value, how can I get a similar value out of my
text documents?
I apologize retroactively if these questions are blatantly obvious in the
documentation and thank everyone in advance for any contribution to my
understanding. Thank you.
Mike------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general