Hi Robert.
To me, this sounds somwhat like Linear Discriminant Analysis or rather
Quadratic Discriminant Analysis (without the shrinking part) to me.
In these methods, a Gaussian is fitted to each class and classification
is done by finding the Gaussian that most likely created a data point.
This is basically the same as finding the mean of each class and
classifying to the nearest using Mahalanobis distance.
I didn't look at the paper but that sounded quite related.
There is no probabilistic way to get the feature-selection shrinking in
this framework,
I guess, but of course you can always just set entries of the mean to zero.
Maybe you can take a closer look at these methods and work out
what the differences are.
Hope that helps,
Andy
On 03/12/2012 04:35 AM, Robert Layton wrote:
Hi All,
On reading some research, it appears that the shrunken centroid
classifier
<http://www-stat.stanford.edu/%7Etibs/PAM/Rdist/howwork.html> is one
of the better methods for authorship analysis.
Therefore, I'm going to implement it at see if it really is, and I was
planning to add it to scikits.learn.
Before I start, I wanted to make sure it wasn't already in
scikits.learn under a different name (as I don't do much
classification, I am not sure).
The method is basically like k-means clustering:
training: each class is represented by its centroid
testing: instances are assigned to the nearest centroid.
That is nearest centroid classification, while the "shrunken" bit
basically a feature selection.
Each centroid is moved towards the dataset centroid (set to 0) by a
threshold value. If any feature crosses over zero, it is set to zero,
effectively eliminating some features from the classification.
In my short research on the subject, I've seen two types of threshold.
The first is the absolute amount to move the point towards the dataset
centroid (i.e. 2.0 units), while the second is the number of features
to reduce each centroid to.
My question is: does scikits.learn have anything already? If not, I'll
start working on it soon.
Thanks,
Robert
--
Public key at: http://pgp.mit.edu/ Search for this email address and
select the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general