Hi Robert.
To me, this sounds somwhat like Linear Discriminant Analysis or rather
Quadratic Discriminant Analysis (without the shrinking part) to me.

In these methods, a Gaussian is fitted to each class and classification
is done by finding the Gaussian that most likely created a data point.

This is basically the same as finding the mean of each class and
classifying to the nearest using Mahalanobis distance.

I didn't look at the paper but that sounded quite related.

There is no probabilistic way to get the feature-selection shrinking in this framework,
I guess, but of course you can always just set entries of the mean to zero.


Maybe you can take a closer look at these methods and work out
what the differences are.

Hope that helps,
Andy


On 03/12/2012 04:35 AM, Robert Layton wrote:
Hi All,

On reading some research, it appears that the shrunken centroid classifier <http://www-stat.stanford.edu/%7Etibs/PAM/Rdist/howwork.html> is one of the better methods for authorship analysis. Therefore, I'm going to implement it at see if it really is, and I was planning to add it to scikits.learn.

Before I start, I wanted to make sure it wasn't already in scikits.learn under a different name (as I don't do much classification, I am not sure).
The method is basically like k-means clustering:
training: each class is represented by its centroid
testing: instances are assigned to the nearest centroid.

That is nearest centroid classification, while the "shrunken" bit basically a feature selection. Each centroid is moved towards the dataset centroid (set to 0) by a threshold value. If any feature crosses over zero, it is set to zero, effectively eliminating some features from the classification.

In my short research on the subject, I've seen two types of threshold. The first is the absolute amount to move the point towards the dataset centroid (i.e. 2.0 units), while the second is the number of features to reduce each centroid to.

My question is: does scikits.learn have anything already? If not, I'll start working on it soon.

Thanks,

Robert

--

Public key at: http://pgp.mit.edu/ Search for this email address and select the key from "2011-08-19" (key id: 54BA8735)


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to