Le 11 mars 2012 20:35, Robert Layton <[email protected]> a écrit : > Hi All, > > On reading some research, it appears that the shrunken centroid classifier > is one of the better methods for authorship analysis. > Therefore, I'm going to implement it at see if it really is, and I was > planning to add it to scikits.learn. > > Before I start, I wanted to make sure it wasn't already in scikits.learn > under a different name (as I don't do much classification, I am not sure). > The method is basically like k-means clustering: > training: each class is represented by its centroid > testing: instances are assigned to the nearest centroid.
I have it in a branch: https://github.com/ogrisel/scikit-learn/tree/nearest-centroid There is no tests, no doc. It works quite good on the olivetti faces but very badly on the text data 20 newsgroups which is kind of unexpected as kmeans is able to cluster the text data quite well. That was kind of unexpected, investigating why it's bad on high dim sparse data my help understand better the nature of text data. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
