Le 11 mars 2012 20:35, Robert Layton <[email protected]> a écrit :
> Hi All,
>
> On reading some research, it appears that the shrunken centroid classifier
> is one of the better methods for authorship analysis.
> Therefore, I'm going to implement it at see if it really is, and I was
> planning to add it to scikits.learn.
>
> Before I start, I wanted to make sure it wasn't already in scikits.learn
> under a different name (as I don't do much classification, I am not sure).
> The method is basically like k-means clustering:
> training: each class is represented by its centroid
> testing: instances are assigned to the nearest centroid.

I have it in a branch:

https://github.com/ogrisel/scikit-learn/tree/nearest-centroid

There is no tests, no doc. It works quite good on the olivetti faces
but very badly on the text data 20 newsgroups which is kind of
unexpected as kmeans is able to cluster the text data quite well. That
was kind of unexpected, investigating why it's bad on high dim sparse
data my help understand better the nature of text data.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to