Re: text categorization with SVM and NaiveBayes

Ken Williams Tue, 09 Jan 2007 04:52:38 -0800


On Jan 8, 2007, at 10:51 AM, Tom Fawcett wrote:

Just to add a note here: Ken is correct -- both NB and SVMs areknown to be rather poor at providing accurate probabilities. Theirscores tend to be too extreme. Producing good probabilities fromthese scores is called calibrating the classifier, and it's morecomplex than just taking a root of the score. There are severalmethods for calibrating scores. The good news is that there's aneffective one called isotonic regression (or Pool AdjacentViolators) which is pretty easy and fast. The bad news is thatthere's no plug-in (ie, CPAN-ready) perl implementation of it (I'vegot a simple implementation which I should convert and contributesomeday).
If you want to read about classifier calibration, google one ofthese titles:
"Transforming classifier scores into accurate multiclassprobability estimates"
by Bianca Zadrozny and Charles Elkan

"Predicting Good Probabilities With Supervised Learning"
by A. Niculescu-Mizil and R. Caruana

Cool, thanks for the references. It might be nice to add somesuchscheme to Algorithm::NaiveBayes (and friends), so that the user has achoice of several normalization schemes, including "none". If I geta surplus of tuits I'll add it, or if you feel like contributing yourstuff that would be great too.


 -Ken

Re: text categorization with SVM and NaiveBayes

Reply via email to